Fast/efficient alternative to bytearray.index()?
Posted: Tue Jun 07, 2022 10:38 pm
It looks like .index() does not exist for bytearray (if it matters, RP2040). I need to search for a byte in a ba type. The only fast alternative I can find is to convert to bytes() type and then use index() there. Testing on Python 2.9, at least on that implementation, this approach isn't much slower than using index on bytearrays. On my PC, both can crunch through 100K tests in about 13 ms.
I would like to avoid allocating memory and triggering garbage collection on MicroPython. I am using conversion to bytes() right now, but would really like to avoid doing this and search in place if at all possible. The bytearray buffer I am using is preallocated and never resized, etc.
Barring the availability of .index(), I would probably write a simple search routine in assembler. I can't find any documentation on how to get a pointer to the start of the data in a bytearray. I know how to dump the entire memory footprint for any data type...but that includes a header and all kinds of other information that could very well change from release to release. In other words, while I can hack this and obtain an offset to the first byte in a bytearray structure in memory right now, this is an approach that isn't likely to survive over time.
Do I have any options?
Here's some test code for silly-slow to the fastest search approach I can identify right now. The last one crashes in MicroPython.
I would like to avoid allocating memory and triggering garbage collection on MicroPython. I am using conversion to bytes() right now, but would really like to avoid doing this and search in place if at all possible. The bytearray buffer I am using is preallocated and never resized, etc.
Barring the availability of .index(), I would probably write a simple search routine in assembler. I can't find any documentation on how to get a pointer to the start of the data in a bytearray. I know how to dump the entire memory footprint for any data type...but that includes a header and all kinds of other information that could very well change from release to release. In other words, while I can hack this and obtain an offset to the first byte in a bytearray structure in memory right now, this is an approach that isn't likely to survive over time.
Do I have any options?
Here's some test code for silly-slow to the fastest search approach I can identify right now. The last one crashes in MicroPython.
Code: Select all
import time
def timing_test_2():
print("Start...")
ITERATIONS = 1000
ba = bytearray([n for n in range(256)])
lst = [n for n in range(256)]
# -------------------------------------------------------------------------------------------
# Brute force search #1
index = -1
length = len(ba)
start = time.ticks_ms()
for test in range(ITERATIONS):
for i in range(length):
if ba[i] == 255:
index = i
break
end = time.ticks_ms()
print(f"Brute force #1: {index} {time.ticks_diff(end, start):10.4f} ms")
# -------------------------------------------------------------------------------------------
# Brute force #2
index = -1
start = time.ticks_ms()
for test in range(ITERATIONS):
i = 0
for num in ba:
if num == 255:
index = i
break
else:
i += 1
end = time.ticks_ms()
print(f"Brute force #2: {index} {time.ticks_diff(end, start):10.4f} ms")
# -------------------------------------------------------------------------------------------
# Brute force #3
index = -1
start = time.ticks_ms()
for test in range(ITERATIONS):
for i,num in enumerate(ba):
if num == 255:
index = i
break
end = time.ticks_ms()
print(f"Brute force #3: {index} {time.ticks_diff(end, start):10.4f} ms")
# -------------------------------------------------------------------------------------------
# Using memoryview()
index = -1
length = len(ba)
mv = memoryview(ba)
start = time.ticks_ms()
for test in range(ITERATIONS):
for i in range(length):
if mv[i] == 255:
index = i
break
end = time.ticks_ms()
print(f"memview(): {index} {time.ticks_diff(end, start):10.4f} ms")
# -------------------------------------------------------------------------------------------
# Using .index() on a list rather than a bytearray()
# Searching for last element for worst case timing
index = -1
start = time.ticks_ms()
for test in range(ITERATIONS):
index = lst.index(255)
end = time.ticks_ms()
print(f"list.index(): {index} {time.ticks_diff(end, start):10.4f} ms")
# -------------------------------------------------------------------------------------------
# Convert to bytes() and use .index() on bytes
index = -1
b = bytes(ba)
start = time.ticks_ms()
for test in range(ITERATIONS):
index = b.index(b'\xFF')
end = time.ticks_ms()
print(f"bytes.index(): {index} {time.ticks_diff(end, start):10.4f} ms")
# -------------------------------------------------------------------------------------------
# Using .index()
# Searching for last element for worst case timing
index = -1
start = time.ticks_ms()
for test in range(ITERATIONS):
index = ba.index(255)
end = time.ticks_ms()
print(f"bytearray.index(): {index} {time.ticks_diff(end, start):10.4f} ms")
print("End")
timing_test_2()