Fast/efficient alternative to bytearray.index()?

martincho · Post by **martincho** » Tue Jun 07, 2022 10:38 pm

It looks like .index() does not exist for bytearray (if it matters, RP2040). I need to search for a byte in a ba type. The only fast alternative I can find is to convert to bytes() type and then use index() there. Testing on Python 2.9, at least on that implementation, this approach isn't much slower than using index on bytearrays. On my PC, both can crunch through 100K tests in about 13 ms.

I would like to avoid allocating memory and triggering garbage collection on MicroPython. I am using conversion to bytes() right now, but would really like to avoid doing this and search in place if at all possible. The bytearray buffer I am using is preallocated and never resized, etc.

Barring the availability of .index(), I would probably write a simple search routine in assembler. I can't find any documentation on how to get a pointer to the start of the data in a bytearray. I know how to dump the entire memory footprint for any data type...but that includes a header and all kinds of other information that could very well change from release to release. In other words, while I can hack this and obtain an offset to the first byte in a bytearray structure in memory right now, this is an approach that isn't likely to survive over time.

Do I have any options?

Here's some test code for silly-slow to the fastest search approach I can identify right now. The last one crashes in MicroPython.

Code: Select all

import time

def timing_test_2():
    print("Start...")
    ITERATIONS = 1000
    ba  = bytearray([n for n in range(256)])
    lst = [n for n in range(256)]

    # -------------------------------------------------------------------------------------------
    # Brute force search #1
    index = -1
    length = len(ba)
    start = time.ticks_ms()
    for test in range(ITERATIONS):
        for i in range(length):
            if ba[i] == 255:
                index = i
                break
    end = time.ticks_ms()
    print(f"Brute force #1:    {index} {time.ticks_diff(end, start):10.4f} ms")

    # -------------------------------------------------------------------------------------------
    # Brute force #2
    index = -1
    start = time.ticks_ms()
    for test in range(ITERATIONS):
        i = 0
        for num in ba:
            if num == 255:
                index = i
                break
            else:
                i += 1
    end = time.ticks_ms()
    print(f"Brute force #2:    {index} {time.ticks_diff(end, start):10.4f} ms")

    # -------------------------------------------------------------------------------------------
    # Brute force #3
    index = -1
    start = time.ticks_ms()
    for test in range(ITERATIONS):
        for i,num in enumerate(ba):
            if num == 255:
                index = i
                break
    end = time.ticks_ms()
    print(f"Brute force #3:    {index} {time.ticks_diff(end, start):10.4f} ms")


    # -------------------------------------------------------------------------------------------
    # Using memoryview()
    index = -1
    length = len(ba)
    mv = memoryview(ba)
    start = time.ticks_ms()
    for test in range(ITERATIONS):
        for i in range(length):
            if mv[i] == 255:
                index = i
                break
    end = time.ticks_ms()
    print(f"memview():         {index} {time.ticks_diff(end, start):10.4f} ms")

    # -------------------------------------------------------------------------------------------
    # Using .index() on a list rather than a bytearray()
    # Searching for last element for worst case timing
    index = -1
    start = time.ticks_ms()
    for test in range(ITERATIONS):
        index = lst.index(255)
    end = time.ticks_ms()
    print(f"list.index():      {index} {time.ticks_diff(end, start):10.4f} ms")

    # -------------------------------------------------------------------------------------------
    # Convert to bytes() and use .index() on bytes
    index = -1
    b = bytes(ba)
    start = time.ticks_ms()
    for test in range(ITERATIONS):
        index = b.index(b'\xFF')
    end = time.ticks_ms()
    print(f"bytes.index():     {index} {time.ticks_diff(end, start):10.4f} ms")

    # -------------------------------------------------------------------------------------------
    # Using .index()
    # Searching for last element for worst case timing
    index = -1
    start = time.ticks_ms()
    for test in range(ITERATIONS):
        index = ba.index(255)
    end = time.ticks_ms()
    print(f"bytearray.index(): {index} {time.ticks_diff(end, start):10.4f} ms")

    print("End")

timing_test_2()

mattyt · Post by **mattyt** » Wed Jun 08, 2022 1:30 am

How's your C? If it's reasonable it may be possible to fix the issue! No pressure, it just may be easier than a workaround.

The problem is here: py/objstr.c:702 where the type is checked (bytes are implemented using a lot of string code). Need another code path where the type is bytes and the type passed to index is int...

Should also add some tests (even a failing .index(int) would be helpful!).

martincho · Post by **martincho** » Wed Jun 08, 2022 2:57 am

mattyt wrote: ↑
Wed Jun 08, 2022 1:30 am
How's your C? If it's reasonable it may be possible to fix the issue! No pressure, it just may be easier than a workaround.

The problem is here: py/objstr.c:702 where the type is checked (bytes are implemented using a lot of string code). Need another code path where the type is bytes and the type passed to index is int...

Should also add some tests (even a failing .index(int) would be helpful!).

My C is excellent. What is lousy right now is my time. I have to finish a project that is going on 1000 boards by the end of the month. I need 48 hour days right now. I'm only able to squeeze out about 20.

I'd be happy to look into contributing after this is behind me. I have lot of ideas. It could be interesting. For now, I think I'll just work around this one.

Thanks for pointing out where in the code this is happening.

pythoncoder · Post by **pythoncoder** » Wed Jun 08, 2022 8:21 am

You haven't told us what platform you are using. If you're using a Pyboard or RP2 this would be extremely simple using the inline assembler. See this repo for examples of this. Pass the bytes object and its length as args, return the index as an int.

martincho · Post by **martincho** » Wed Jun 08, 2022 8:25 am

pythoncoder wrote: ↑
Wed Jun 08, 2022 8:21 am
You haven't told us what platform you are using. If you're using a Pyboard or RP2 this would be extremely simple using the inline assembler. See this repo for examples of this. Pass the bytes object and its length as args, return the index as an int.

Sorry. RP2040.

I'll have a look. Assembler would be perfect at this point.

EDIT:
Nice. I've done wide polyphase FIR filters in Xilinx FPGA's for real-time image processing. Nice to see what today's MCU's are able to do.

I am also wondering if I can get clever and actually make a bytes() object mutable. Once I know where the data is, who's going to stop me from changing bytes in assembler? bytes() are such a simple structure in memory it's asking for this kind of manipulation. Just a header, length, some other cruft and, at the end, the bytes and a zero for good measure.

pythoncoder · Post by **pythoncoder** » Wed Jun 08, 2022 8:47 am

RP2 supports Arm Thumb V6 assembler. The docs are for V7. The big difference is the lack of FP. The push and pop multiple registers are restricted to the lower 8 registers. But for what you want to do there should be no problems.

I haven't tried passing a bytes object to assembler but in general arrays and bytearrays are passed by reference.

chuckbook · Post by **chuckbook** » Thu Jun 09, 2022 2:13 pm

What about:

Code: Select all

from uctypes import bytes_at, addressof
b = bytes_at(addressof(ba), 256)

chuckbook · Post by **chuckbook** » Thu Jun 09, 2022 2:37 pm

Sorry for the noise, bytes_at by ref isn't in master yet.

martincho · Post by **martincho** » Thu Jun 09, 2022 5:04 pm

It took me a couple of days to get to it. I decided to compare various solutions for getting around the lack of .index() in bytearrays. Here's what I came up with.

Am I right in thinking that only a subset of ARM instructions are supported? I tried to lrdb(r3, [r3, r4]), which I think should work but I got an error. I would very much appreciate critique on any and all of this. Obviously assembler does the best job. It's interesting to see what you can achieve with viper though.

I return -1 when the value isn't found. No need to deal with exceptions.

Code: Select all

import time
import gc

from random import randint

ba = bytearray(range(256))


# Using .index() after converting everything to bytes()
def look_for0(buffer:bytearray, value:int) -> int:
    return bytes(buffer).index(bytes([value,]))

# Basic viper form
@micropython.viper
def look_for(buffer:ptr8, length:int, value:int) -> int:
    for i in range(length):
        if buffer[i] == value:
            return i
    return -1


# Some optimization
@micropython.viper
def look_for2(buffer:ptr8, length:int, value:int) -> int:
    i = 0
    while True:
        if buffer[i] == value:
            return i
        else:
            i += 1
            if i == length:
                return -1


# Is this faster?
@micropython.viper
def look_for3(buffer:ptr8, length:int, value:int) -> int:
    i = 0
    while True:
        if buffer[i] == value:
            return i
        else:
            i = i + 1   # Does viper look at this differently?
            if i == length:
                return -1


# Are short variable names better?
@micropython.viper
def look_for4(b:ptr8, l:int, v:int) -> int:
    i = 0
    while True:
        if b[i] == v:
            return i
        else:
            i += 1   # Does viper look at this differently
            if i == l:
                return -1


# Assembler...
@micropython.asm_thumb
def look_for5(r0, r1, r2) -> int:
    # r0 bytearray address
    # r1 bytearray length
    # r2 value to search for
    mov(r4, 0)          # Initialize index
    label(SEARCH)       # Start of search loop
    mov(r3, r0)         # Get bytearray start address
    add(r3, r3, r4)     # Get next value from memory
    ldrb(r3, [r3, 0])   #
    cmp(r2, r3)         # Match?
    beq(DONE)           # Yes. Done.
    add(r4, 1)          # No. Increment index.
    cmp(r4, r1)         # Keep going if we are not past the end of the bytearray
    ble(SEARCH)         #
    mov(r0, 0)          # Not found. Return -1
    sub(r0, 1)          #
    b(END)              #
    label(DONE)         # Return the index where we got a match
    mov(r0, r4)         #
    label(END)          # That's all!



# Timing tests
def test_1():
    ITERATIONS = 1000
    # target = randint(0,255)
    # target = 255

    size = len(ba)  # Pre-calculate length
    l = look_for2  # Load function address

    for i in range(2):
        if i:
            print(f"\nRandom search target, {ITERATIONS} iterations")
            target = randint(0, 255)
        else:
            print(f"\nWorst case search target, {ITERATIONS} iterations")
            target = 255

        gc.collect()
        start = time.ticks_ms()
        for test in range(ITERATIONS):
            look_for0(ba, target)
        end = time.ticks_ms()
        print(f"Time 0: {time.ticks_diff(end, start):10.1f} ms")

        gc.collect()
        start = time.ticks_ms()
        for test in range(ITERATIONS):
            look_for(ba, size, target)
        end = time.ticks_ms()
        print(f"Time 1: {time.ticks_diff(end, start):10.1f} ms")

        gc.collect()
        start = time.ticks_ms()
        for test in range(ITERATIONS):
            look_for2(ba, size, target)
        end = time.ticks_ms()
        print(f"Time 2: {time.ticks_diff(end, start):10.1f} ms")

        gc.collect()
        start = time.ticks_ms()
        for test in range(ITERATIONS):
            look_for3(ba, size, target)
        end = time.ticks_ms()
        print(f"Time 3: {time.ticks_diff(end, start):10.1f} ms")

        gc.collect()
        start = time.ticks_ms()
        for test in range(ITERATIONS):
            look_for3(ba, size, target)
        end = time.ticks_ms()
        print(f"Time 4: {time.ticks_diff(end, start):10.1f} ms")

        gc.collect()
        # This is look_for2 after with preloading the address
        start = time.ticks_ms()
        for test in range(ITERATIONS):
            l(ba, size, target)
        end = time.ticks_ms()
        print(f"Time 2a: {time.ticks_diff(end, start):9.1f} ms")

        gc.collect()
        start = time.ticks_ms()
        for test in range(ITERATIONS):
            look_for5(ba, size, target)
        end = time.ticks_ms()
        print(f"Time 5: {time.ticks_diff(end, start):10.1f} ms")


# Does it find all values?
def test_2():
    for n in range(256):
        print(f"{n} index: {look_for5(ba, len(ba), n)}")

test_1()

Results:

Code: Select all

Worst case search target, 1000 iterations
Time 0:      231.0 ms
Time 1:      106.0 ms
Time 2:       91.0 ms
Time 3:       91.0 ms
Time 4:       91.0 ms
Time 2a:      91.0 ms
Time 5:       31.0 ms

Random search target, 1000 iterations
Time 0:      199.0 ms
Time 1:       79.0 ms
Time 2:       68.0 ms
Time 3:       68.0 ms
Time 4:       69.0 ms
Time 2a:      67.0 ms
Time 5:       25.0 ms

martincho · Post by **martincho** » Thu Jun 09, 2022 5:07 pm

Anyone know how to get PyCharm to understand MicroPython assembler and not torture me with this view?

Code works fine, it's just PyCharm that isn't happy at all.

MicroPython Forum (Archive)

Fast/efficient alternative to bytearray.index()?

Fast/efficient alternative to bytearray.index()?

Re: Fast/efficient alternative to bytearray.index()?

Re: Fast/efficient alternative to bytearray.index()?

Re: Fast/efficient alternative to bytearray.index()?

Re: Fast/efficient alternative to bytearray.index()?

Re: Fast/efficient alternative to bytearray.index()?

Re: Fast/efficient alternative to bytearray.index()?

Re: Fast/efficient alternative to bytearray.index()?

Re: Fast/efficient alternative to bytearray.index()?

Re: Fast/efficient alternative to bytearray.index()?