Most efficient searching within large bytearrays

The official pyboard running MicroPython.
This is the reference design and main target board for MicroPython.
You can buy one at the store.
Target audience: Users with a pyboard.
Post Reply
marc4444
Posts: 11
Joined: Sat Aug 03, 2019 12:16 pm

Most efficient searching within large bytearrays

Post by marc4444 » Sun Mar 15, 2020 6:28 pm

Hi All,

I've created a large (2000) bytearray with a memoryview to read a UART into. The commands that the UART is receiving can sometimes take a while to arrive, so I'd like to check for a specific finish character sequence within the bytearray in a loop.

I've noticed that Micropython doesn't support the .find() method. What is the most efficient way of doing this without creating a copy of the bytearray? Ideally I could specify a start/finish index too like the find method. I have something rough below that i'm using but doesn't seem very efficient.

Code: Select all

def has_substring(input_string, pattern, start=0, end=-1):
    j=0
    i=start
    k=0
    if end == -1:
        l = len(input_string)
    else:
        l = end
    m = len(pattern)
    while((i<l) and (j<m)):
        if(input_string[i] == pattern[j]):
            i+=1
            j+=1
        else:
            j=0
            k+=1
            i=k
    if(j==m):
        return i
    else:
        return -1
Thanks in advance for the help!

User avatar
Roberthh
Posts: 1889
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: Most efficient searching within large bytearrays

Post by Roberthh » Sun Mar 15, 2020 7:59 pm

find is supported. How did you try?

marc4444
Posts: 11
Joined: Sat Aug 03, 2019 12:16 pm

Re: Most efficient searching within large bytearrays

Post by marc4444 » Sun Mar 15, 2020 8:05 pm

Hi Robert,

Thanks for the quick reply, see below prints on the REPL. I'm just using a pyboard, do I have some old version or something? I should have added to my original post that I'm trying to do the .find() on the bytearray.

Thanks!

Print from Pyboard:

Code: Select all

>>> a = bytearray(20)
>>> a.find(b'\r\n')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'bytearray' object has no attribute 'find'
>>> print(type(a))
<class 'bytearray'>
>>> print(dir(a))
['__class__', 'append', 'extend', 'decode']
Print on python 3 on PC:

Code: Select all

>>> a = bytearray(b'qwerty')
>>> a.find(b'ty')
4

User avatar
Roberthh
Posts: 1889
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: Most efficient searching within large bytearrays

Post by Roberthh » Sun Mar 15, 2020 8:59 pm

You are right, in that there is no find for bytearrays. I looked at strings and bytes objects.

marc4444
Posts: 11
Joined: Sat Aug 03, 2019 12:16 pm

Re: Most efficient searching within large bytearrays

Post by marc4444 » Mon Mar 16, 2020 9:25 am

Thanks Robert - any ideas from anyone on a more efficient way than the function I posted are appreciated?

Best,
Marc

User avatar
Roberthh
Posts: 1889
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: Most efficient searching within large bytearrays

Post by Roberthh » Mon Mar 16, 2020 10:13 am

You can use comparison on substrings, like equality. Using memoryview avoids allocation. But for short patterns the overhead may be similar, only the code looks smaller. You can also use something like

pattern in input_string

as a fast test, whether the patter is in the buffer at all. If input_string would not be so large, then bytes(input_string).find(pattern) would be useful. But bytes() creates a temporary object.

Post Reply