about re module

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
Post Reply
wwsheldons
Posts: 31
Joined: Wed Dec 02, 2015 1:47 pm

about re module

Post by wwsheldons » Fri Jun 03, 2016 11:57 am

If I run the
tmp=re.match('^abc(.*?)cde$','abc123213cde')
tmp.group(1)
it can print '123213'.
but if I run the
a=b'\r\nSEND OK\r\nAT+CIPSEND=125\r\n~011200g\x8a\xb0\xd3F\x0b\xc9\x88\x1e\x9d\xaf\x14\x9f\xfb\x9fR\xcb\xc5\xb4"\xf8>\xff\x0ftc t,\x18\xfa\xad\xf7!0V\xb6\xa6Y\xefH8\x0e\x17\xd4\xc8s\x0f\x04\xd5\xb9+\xbev\x8ej\x0b\xb8.Zf\xe0}\x01\x8f!y_%\xe9\xacO\xf64\xd3K\x06N\x99\xeeR\x92\x7fw7|\x90\x95r\xe8q\xbdnf\x137w=j\x85U\xc6\x93\xceb*\x8d\x96\xeaw\xa1\xee\xf8\r\n> \xd1\xae\xd4\x89~\r\nSEND OK\r\n~003200\xb5\xe6\x946\xcc\n\x11\x9d\xf0]\xd7i\x88\xdc}\x02<\xf1\xd5\x03\xe8\x0b\x14\xdaM\xb8G\x03F\x93\xd6\x8fg\x00\xfa\x86\x04~'
tmp=re.match('^~0(.*?)~$',a)
tmp.group(1)
it will return None, who can tell me why?

Online
User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: about re module

Post by Roberthh » Fri Jun 03, 2016 2:57 pm

re.match() is anchored to the start of the string. You have to use re.search(), if the pattern is in the middle of the string. And in re.search(), '^' and '$' mean the begin and end of the string, not special characters like \r or \n.

wwsheldons
Posts: 31
Joined: Wed Dec 02, 2015 1:47 pm

Re: about re module

Post by wwsheldons » Fri Jun 03, 2016 5:36 pm

Roberthh wrote:re.match() is anchored to the start of the string. You have to use re.search(), if the pattern is in the middle of the string. And in re.search(), '^' and '$' mean the begin and end of the string, not special characters like \r or \n.

the function match and search will get the same rezult. it can not work :cry:

SpotlightKid
Posts: 463
Joined: Wed Apr 08, 2015 5:19 am

Re: about re module

Post by SpotlightKid » Fri Jun 03, 2016 6:36 pm

wwsheldons wrote:who can tell me why?
Me. It is because your second regex tries to match '~0' at the beginning ('^') of the string, but the value of a doesn't match. If you want to match at the beginning of a line, you need the re.MULTILINE flag. Not sure if the micropython re module supports that.

https://docs.python.org/3/library/re.ht ... e-contents

wwsheldons
Posts: 31
Joined: Wed Dec 02, 2015 1:47 pm

Re: about re module

Post by wwsheldons » Sat Jun 04, 2016 3:30 am

SpotlightKid wrote:
wwsheldons wrote:who can tell me why?
Me. It is because your second regex tries to match '~0' at the beginning ('^') of the string, but the value of a doesn't match. If you want to match at the beginning of a line, you need the re.MULTILINE flag. Not sure if the micropython re module supports that.

https://docs.python.org/3/library/re.ht ... e-contents


tmp=re.match('^abc(.*?)cde$','abc123213cde')
can work well

Online
User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: about re module

Post by Roberthh » Sat Jun 04, 2016 7:40 am

Besides the problem of anchored and non-anchored search, there seems to be a problem with the data.
I tried to split up a in to lines and search/match these against the search pattern, with no success.
Using a text with only ASCII character works. So it may be related to what your searching at. Trying something like:

Code: Select all

a=b'\r\nSEND OK\r\nAT+CIPSEND=125\r\n~011200g\x8a\xb0\xd3F\x0b\xc9\x88\x1e\x9d\xaf\x14\x9f\xfb\x9fR\xcb\xc5\xb4"\xf8>\xff\x0ftc t,\x18\xfa\xad\xf7!0V\xb6\xa6Y\xefH8\x0e\x17\xd4\xc8s\x0f\x04\xd5\xb9+\xbev\x8ej\x0b\xb8.Zf\xe0}\x01\x8f!y_%\xe9\xacO\xf64\xd3K\x06N\x99\xeeR\x92\x7fw7|\x90\x95r\xe8q\xbdnf\x137w=j\x85U\xc6\x93\xceb*\x8d\x96\xeaw\xa1\xee\xf8\r\n> \xd1\xae\xd4\x89~\r\nSEND OK\r\n~003200\xb5\xe6\x946\xcc\n\x11\x9d\xf0]\xd7i\x88\xdc}\x02<\xf1\xd5\x03\xe8\x0b\x14\xdaM\xb8G\x03F\x93\xd6\x8fg\x00\xfa\x86\x04~'
for l in a.decode().split("\r"):
     tmp=re.search("~0", l.lstrip("\n"))
gets a match, but:

Code: Select all

tmp=re.search('~0(.*?)~', l.lstrip("\n"))
does not. It does not matter whether search or match is used in that case. The problems are the null bytes in the string.
You may change the loop into:

Code: Select all

for l in a.decode().split("\r"):
    ll = repr(l.lstrip("\n"))
    tmp=re.search('~0(.*?)~', ll)
    if tmp:
         print ("match", tmp.group(1))
Then it works, but with a changed search object.

wwsheldons
Posts: 31
Joined: Wed Dec 02, 2015 1:47 pm

Re: about re module

Post by wwsheldons » Tue Jun 07, 2016 11:35 am

Roberthh wrote:Besides the problem of anchored and non-anchored search, there seems to be a problem with the data.
I tried to split up a in to lines and search/match these against the search pattern, with no success.
Using a text with only ASCII character works. So it may be related to what your searching at. Trying something like:

Code: Select all

a=b'\r\nSEND OK\r\nAT+CIPSEND=125\r\n~011200g\x8a\xb0\xd3F\x0b\xc9\x88\x1e\x9d\xaf\x14\x9f\xfb\x9fR\xcb\xc5\xb4"\xf8>\xff\x0ftc t,\x18\xfa\xad\xf7!0V\xb6\xa6Y\xefH8\x0e\x17\xd4\xc8s\x0f\x04\xd5\xb9+\xbev\x8ej\x0b\xb8.Zf\xe0}\x01\x8f!y_%\xe9\xacO\xf64\xd3K\x06N\x99\xeeR\x92\x7fw7|\x90\x95r\xe8q\xbdnf\x137w=j\x85U\xc6\x93\xceb*\x8d\x96\xeaw\xa1\xee\xf8\r\n> \xd1\xae\xd4\x89~\r\nSEND OK\r\n~003200\xb5\xe6\x946\xcc\n\x11\x9d\xf0]\xd7i\x88\xdc}\x02<\xf1\xd5\x03\xe8\x0b\x14\xdaM\xb8G\x03F\x93\xd6\x8fg\x00\xfa\x86\x04~'
for l in a.decode().split("\r"):
     tmp=re.search("~0", l.lstrip("\n"))
gets a match, but:

Code: Select all

tmp=re.search('~0(.*?)~', l.lstrip("\n"))
does not. It does not matter whether search or match is used in that case. The problems are the null bytes in the string.
You may change the loop into:

Code: Select all

for l in a.decode().split("\r"):
    ll = repr(l.lstrip("\n"))
    tmp=re.search('~0(.*?)~', ll)
    if tmp:
         print ("match", tmp.group(1))
Then it works, but with a changed search object.

thanks!

Post Reply