Sensible regular expressions on a micro

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
Post Reply
clack
Posts: 20
Joined: Sat May 03, 2014 11:09 am

Sensible regular expressions on a micro

Post by clack » Wed Mar 04, 2015 12:24 pm

I am trying to parse a text file, problem is I want to do it as fast as possible.

I have elements like this
# is a comment
speed=/loop= for pre-defined variables e.g speed=100
0 0 0 0 0 0 0 0 a data sequence e.g '1 3 5 2 3 6 2 3'

First I did this

Code: Select all

with open("data.txt") as sequence:
	for lineNumber,line in enumerate(sequence):
		print(lineNumber)
		if ure.match('#',line) == None or ure.match('\n',line) == None:
		etc etc
but I thought if I am calling ure functions multiple times a line its going to get sluggish, . So maybe instead I should make a mega expression! and pass it into a variable.

Code: Select all

line = '1 2 3 4 5 6 7 8'
bob = ure.match('(#)? *((speed) *= *([0-9]*))? *((loop) *= *([0-9]*))?( *([0-9]) *([0-9]) *([0-9]) *([0-9]) *([0-9]) *([0-9]) *([0-9]) *([0-9])*)?','line')

for i in range(17):
	print("group %d : %s" % (i,bob.group(i)))
So I made this test above that works in regular python. But when I try it out in uPython I get this below

Code: Select all

group 0 : 1 2 3 4 5 6 7 8
Traceback (most recent call last):
  File "main.py", line 7, in <module>
MemoryError: memory allocation failed, allocating 4262065870 bytes
I think this is a problem with micro python as I posted here
https://github.com/micropython/micropython/issues/1122

Does anyone have an idea of how I could do this in a far more efficient way? avoiding this issue?

User avatar
kfricke
Posts: 342
Joined: Mon May 05, 2014 9:13 am
Location: Germany

Re: Sensible regular expressions on a micro

Post by kfricke » Wed Mar 04, 2015 5:21 pm

To cover your speed dependency with regular expressions you of course must precompile them. Maybe you ran into a bug in this module. Trying to allocate 4 GBytes of RAM seems to be bug somewhere. You should file a bug report with a more concrete example here.

Maybe you can use a simpler approach until then and traditionally check each line for comments (first character in line decides this), for speed/loop definitions (occurrence of '=') and finally simply split the data sequences using the fast string method.

Code: Select all

sequences.split(b' ')
edit: I do smell a bug here...

Post Reply