This version is about 12% faster.
I can't think of a way to optimize the routine further than this. Pre-computing the function's address before running it in the loop makes each search 1 microsecond faster. That's about it, I think. Is there a faster way to pass the bytearray address, length and value?
Code: Select all
# Assembler...take 2
@micropython.asm_thumb
def look_for6(r0, r1, r2) -> int:
# r0 bytearray address
# r1 bytearray length
# r2 value to search for
add(r3, r0, r1) # Pre-calculate last address
mov(r4, r0) # Working pointer starts at first byte
label(SEARCH) # Start of search loop
ldrb(r1, [r4, 0]) # Get next value from memory
cmp(r2, r1) # Match?
beq(DONE) # Yes. Done.
add(r4, 1) # No. Increment pointer
cmp(r4, r3) # If we are not past the end of the bytearray, keep going
ble(SEARCH) #
mov(r0, 0) # Not found. Return -1
sub(r0, 1) #
b(END) #
label(DONE) # Return the index where we got a match
sub(r0, r4, r0) #
label(END) # That's all!
Code: Select all
Worst case search target, 1000 iterations
Time 0: 231.0 ms
Time 1: 106.0 ms
Time 2: 92.0 ms
Time 3: 91.0 ms
Time 4: 91.0 ms
Time 2a: 90.0 ms
Time 5: 32.0 ms
Time 6: 28.0 ms
Random search target, 1000 iterations
Time 0: 223.0 ms
Time 1: 98.0 ms
Time 2: 85.0 ms
Time 3: 85.0 ms
Time 4: 85.0 ms
Time 2a: 84.0 ms
Time 5: 30.0 ms
Time 6: 26.0 ms