Page 1 of 1

Most efficient searching within large bytearrays

Posted: Sun Mar 15, 2020 6:28 pm
by marc4444
Hi All,

I've created a large (2000) bytearray with a memoryview to read a UART into. The commands that the UART is receiving can sometimes take a while to arrive, so I'd like to check for a specific finish character sequence within the bytearray in a loop.

I've noticed that Micropython doesn't support the .find() method. What is the most efficient way of doing this without creating a copy of the bytearray? Ideally I could specify a start/finish index too like the find method. I have something rough below that i'm using but doesn't seem very efficient.

Code: Select all

def has_substring(input_string, pattern, start=0, end=-1):
    j=0
    i=start
    k=0
    if end == -1:
        l = len(input_string)
    else:
        l = end
    m = len(pattern)
    while((i<l) and (j<m)):
        if(input_string[i] == pattern[j]):
            i+=1
            j+=1
        else:
            j=0
            k+=1
            i=k
    if(j==m):
        return i
    else:
        return -1
Thanks in advance for the help!

Re: Most efficient searching within large bytearrays

Posted: Sun Mar 15, 2020 7:59 pm
by Roberthh
find is supported. How did you try?

Re: Most efficient searching within large bytearrays

Posted: Sun Mar 15, 2020 8:05 pm
by marc4444
Hi Robert,

Thanks for the quick reply, see below prints on the REPL. I'm just using a pyboard, do I have some old version or something? I should have added to my original post that I'm trying to do the .find() on the bytearray.

Thanks!

Print from Pyboard:

Code: Select all

>>> a = bytearray(20)
>>> a.find(b'\r\n')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'bytearray' object has no attribute 'find'
>>> print(type(a))
<class 'bytearray'>
>>> print(dir(a))
['__class__', 'append', 'extend', 'decode']
Print on python 3 on PC:

Code: Select all

>>> a = bytearray(b'qwerty')
>>> a.find(b'ty')
4

Re: Most efficient searching within large bytearrays

Posted: Sun Mar 15, 2020 8:59 pm
by Roberthh
You are right, in that there is no find for bytearrays. I looked at strings and bytes objects.

Re: Most efficient searching within large bytearrays

Posted: Mon Mar 16, 2020 9:25 am
by marc4444
Thanks Robert - any ideas from anyone on a more efficient way than the function I posted are appreciated?

Best,
Marc

Re: Most efficient searching within large bytearrays

Posted: Mon Mar 16, 2020 10:13 am
by Roberthh
You can use comparison on substrings, like equality. Using memoryview avoids allocation. But for short patterns the overhead may be similar, only the code looks smaller. You can also use something like

pattern in input_string

as a fast test, whether the patter is in the buffer at all. If input_string would not be so large, then bytes(input_string).find(pattern) would be useful. But bytes() creates a temporary object.

Re: Most efficient searching within large bytearrays

Posted: Thu Apr 22, 2021 9:08 am
by TomLin
I have the same problem. Tried above mentioned solution: bytes(input_string).find(pattern) but it results in error.

Here is my sample code (with only essential lines shown):

buf = bytearray(255) #initialize UART input buffer as bytearray
resp2 = uart.readinto(buf) #read UART input into buf
print (bytes(buf).find('OK')) #test if buf contains a substring 'OK'

The last line raises an an error:
TypeError: can't convert 'str' object to bytes implicitly
MicroPython v1.11 on 2019-05-29; PYBv1.1 with STM32F405RG

Can you please advise me how to correct this?

Re: Most efficient searching within large bytearrays

Posted: Thu Apr 22, 2021 9:20 am
by Roberthh
print (bytes(buf).find(b'OK')) #test if buf contains a substring 'OK'

Note: bytes(buf) creates a copy of buf.

Re: Most efficient searching within large bytearrays

Posted: Thu Apr 22, 2021 11:59 am
by TomLin
Oh yes, of course! Thank you so much for your quick answer.

Concerning the making of a copy of bytearray buf, it is kind of waste of memory, which could be tolerated,
however. But if I put this sentence within a loop, will this extra area be reused during consecutive passes of
the code in the loop?