Most efficient searching within large bytearrays

The official pyboard running MicroPython.
This is the reference design and main target board for MicroPython.
You can buy one at the store.
Target audience: Users with a pyboard.
Post Reply
marc4444
Posts: 14
Joined: Sat Aug 03, 2019 12:16 pm

Most efficient searching within large bytearrays

Post by marc4444 » Sun Mar 15, 2020 6:28 pm

Hi All,

I've created a large (2000) bytearray with a memoryview to read a UART into. The commands that the UART is receiving can sometimes take a while to arrive, so I'd like to check for a specific finish character sequence within the bytearray in a loop.

I've noticed that Micropython doesn't support the .find() method. What is the most efficient way of doing this without creating a copy of the bytearray? Ideally I could specify a start/finish index too like the find method. I have something rough below that i'm using but doesn't seem very efficient.

Code: Select all

def has_substring(input_string, pattern, start=0, end=-1):
    j=0
    i=start
    k=0
    if end == -1:
        l = len(input_string)
    else:
        l = end
    m = len(pattern)
    while((i<l) and (j<m)):
        if(input_string[i] == pattern[j]):
            i+=1
            j+=1
        else:
            j=0
            k+=1
            i=k
    if(j==m):
        return i
    else:
        return -1
Thanks in advance for the help!

User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: Most efficient searching within large bytearrays

Post by Roberthh » Sun Mar 15, 2020 7:59 pm

find is supported. How did you try?

marc4444
Posts: 14
Joined: Sat Aug 03, 2019 12:16 pm

Re: Most efficient searching within large bytearrays

Post by marc4444 » Sun Mar 15, 2020 8:05 pm

Hi Robert,

Thanks for the quick reply, see below prints on the REPL. I'm just using a pyboard, do I have some old version or something? I should have added to my original post that I'm trying to do the .find() on the bytearray.

Thanks!

Print from Pyboard:

Code: Select all

>>> a = bytearray(20)
>>> a.find(b'\r\n')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'bytearray' object has no attribute 'find'
>>> print(type(a))
<class 'bytearray'>
>>> print(dir(a))
['__class__', 'append', 'extend', 'decode']
Print on python 3 on PC:

Code: Select all

>>> a = bytearray(b'qwerty')
>>> a.find(b'ty')
4

User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: Most efficient searching within large bytearrays

Post by Roberthh » Sun Mar 15, 2020 8:59 pm

You are right, in that there is no find for bytearrays. I looked at strings and bytes objects.

marc4444
Posts: 14
Joined: Sat Aug 03, 2019 12:16 pm

Re: Most efficient searching within large bytearrays

Post by marc4444 » Mon Mar 16, 2020 9:25 am

Thanks Robert - any ideas from anyone on a more efficient way than the function I posted are appreciated?

Best,
Marc

User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: Most efficient searching within large bytearrays

Post by Roberthh » Mon Mar 16, 2020 10:13 am

You can use comparison on substrings, like equality. Using memoryview avoids allocation. But for short patterns the overhead may be similar, only the code looks smaller. You can also use something like

pattern in input_string

as a fast test, whether the patter is in the buffer at all. If input_string would not be so large, then bytes(input_string).find(pattern) would be useful. But bytes() creates a temporary object.

TomLin
Posts: 4
Joined: Thu Apr 22, 2021 8:47 am

Re: Most efficient searching within large bytearrays

Post by TomLin » Thu Apr 22, 2021 9:08 am

I have the same problem. Tried above mentioned solution: bytes(input_string).find(pattern) but it results in error.

Here is my sample code (with only essential lines shown):

buf = bytearray(255) #initialize UART input buffer as bytearray
resp2 = uart.readinto(buf) #read UART input into buf
print (bytes(buf).find('OK')) #test if buf contains a substring 'OK'

The last line raises an an error:
TypeError: can't convert 'str' object to bytes implicitly
MicroPython v1.11 on 2019-05-29; PYBv1.1 with STM32F405RG

Can you please advise me how to correct this?

User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: Most efficient searching within large bytearrays

Post by Roberthh » Thu Apr 22, 2021 9:20 am

print (bytes(buf).find(b'OK')) #test if buf contains a substring 'OK'

Note: bytes(buf) creates a copy of buf.

TomLin
Posts: 4
Joined: Thu Apr 22, 2021 8:47 am

Re: Most efficient searching within large bytearrays

Post by TomLin » Thu Apr 22, 2021 11:59 am

Oh yes, of course! Thank you so much for your quick answer.

Concerning the making of a copy of bytearray buf, it is kind of waste of memory, which could be tolerated,
however. But if I put this sentence within a loop, will this extra area be reused during consecutive passes of
the code in the loop?

Post Reply