about re module

wwsheldons · Post by **wwsheldons** » Fri Jun 03, 2016 11:57 am

If I run the
tmp=re.match('^abc(.*?)cde$','abc123213cde')
tmp.group(1)
it can print '123213'.
but if I run the
a=b'\r\nSEND OK\r\nAT+CIPSEND=125\r\n~011200g\x8a\xb0\xd3F\x0b\xc9\x88\x1e\x9d\xaf\x14\x9f\xfb\x9fR\xcb\xc5\xb4"\xf8>\xff\x0ftc t,\x18\xfa\xad\xf7!0V\xb6\xa6Y\xefH8\x0e\x17\xd4\xc8s\x0f\x04\xd5\xb9+\xbev\x8ej\x0b\xb8.Zf\xe0}\x01\x8f!y_%\xe9\xacO\xf64\xd3K\x06N\x99\xeeR\x92\x7fw7|\x90\x95r\xe8q\xbdnf\x137w=j\x85U\xc6\x93\xceb*\x8d\x96\xeaw\xa1\xee\xf8\r\n> \xd1\xae\xd4\x89~\r\nSEND OK\r\n~003200\xb5\xe6\x946\xcc\n\x11\x9d\xf0]\xd7i\x88\xdc}\x02<\xf1\xd5\x03\xe8\x0b\x14\xdaM\xb8G\x03F\x93\xd6\x8fg\x00\xfa\x86\x04~'
tmp=re.match('^~0(.*?)~$',a)
tmp.group(1)
it will return None, who can tell me why?

Roberthh · Post by **Roberthh** » Fri Jun 03, 2016 2:57 pm

re.match() is anchored to the start of the string. You have to use re.search(), if the pattern is in the middle of the string. And in re.search(), '^' and '$' mean the begin and end of the string, not special characters like \r or \n.

wwsheldons · Post by **wwsheldons** » Fri Jun 03, 2016 5:36 pm

Roberthh wrote:re.match() is anchored to the start of the string. You have to use re.search(), if the pattern is in the middle of the string. And in re.search(), '^' and '$' mean the begin and end of the string, not special characters like \r or \n.

the function match and search will get the same rezult. it can not work

SpotlightKid · Post by **SpotlightKid** » Fri Jun 03, 2016 6:36 pm

wwsheldons wrote:who can tell me why?

Me. It is because your second regex tries to match '~0' at the beginning ('^') of the string, but the value of a doesn't match. If you want to match at the beginning of a line, you need the re.MULTILINE flag. Not sure if the micropython re module supports that.

https://docs.python.org/3/library/re.ht ... e-contents

wwsheldons · Post by **wwsheldons** » Sat Jun 04, 2016 3:30 am

SpotlightKid wrote:
wwsheldons wrote:who can tell me why?
Me. It is because your second regex tries to match '~0' at the beginning ('^') of the string, but the value of a doesn't match. If you want to match at the beginning of a line, you need the re.MULTILINE flag. Not sure if the micropython re module supports that.

https://docs.python.org/3/library/re.ht ... e-contents

tmp=re.match('^abc(.*?)cde$','abc123213cde')
can work well

Roberthh · Post by **Roberthh** » Sat Jun 04, 2016 7:40 am

Besides the problem of anchored and non-anchored search, there seems to be a problem with the data.
I tried to split up a in to lines and search/match these against the search pattern, with no success.
Using a text with only ASCII character works. So it may be related to what your searching at. Trying something like:

Code: Select all

a=b'\r\nSEND OK\r\nAT+CIPSEND=125\r\n~011200g\x8a\xb0\xd3F\x0b\xc9\x88\x1e\x9d\xaf\x14\x9f\xfb\x9fR\xcb\xc5\xb4"\xf8>\xff\x0ftc t,\x18\xfa\xad\xf7!0V\xb6\xa6Y\xefH8\x0e\x17\xd4\xc8s\x0f\x04\xd5\xb9+\xbev\x8ej\x0b\xb8.Zf\xe0}\x01\x8f!y_%\xe9\xacO\xf64\xd3K\x06N\x99\xeeR\x92\x7fw7|\x90\x95r\xe8q\xbdnf\x137w=j\x85U\xc6\x93\xceb*\x8d\x96\xeaw\xa1\xee\xf8\r\n> \xd1\xae\xd4\x89~\r\nSEND OK\r\n~003200\xb5\xe6\x946\xcc\n\x11\x9d\xf0]\xd7i\x88\xdc}\x02<\xf1\xd5\x03\xe8\x0b\x14\xdaM\xb8G\x03F\x93\xd6\x8fg\x00\xfa\x86\x04~'
for l in a.decode().split("\r"):
     tmp=re.search("~0", l.lstrip("\n"))

gets a match, but:

Code: Select all

tmp=re.search('~0(.*?)~', l.lstrip("\n"))

does not. It does not matter whether search or match is used in that case. The problems are the null bytes in the string.
You may change the loop into:

Code: Select all

for l in a.decode().split("\r"):
    ll = repr(l.lstrip("\n"))
    tmp=re.search('~0(.*?)~', ll)
    if tmp:
         print ("match", tmp.group(1))

Then it works, but with a changed search object.

wwsheldons · Post by **wwsheldons** » Tue Jun 07, 2016 11:35 am

Roberthh wrote:Besides the problem of anchored and non-anchored search, there seems to be a problem with the data.
I tried to split up a in to lines and search/match these against the search pattern, with no success.
Using a text with only ASCII character works. So it may be related to what your searching at. Trying something like:
Code: Select all
a=b'\r\nSEND OK\r\nAT+CIPSEND=125\r\n~011200g\x8a\xb0\xd3F\x0b\xc9\x88\x1e\x9d\xaf\x14\x9f\xfb\x9fR\xcb\xc5\xb4"\xf8>\xff\x0ftc t,\x18\xfa\xad\xf7!0V\xb6\xa6Y\xefH8\x0e\x17\xd4\xc8s\x0f\x04\xd5\xb9+\xbev\x8ej\x0b\xb8.Zf\xe0}\x01\x8f!y_%\xe9\xacO\xf64\xd3K\x06N\x99\xeeR\x92\x7fw7|\x90\x95r\xe8q\xbdnf\x137w=j\x85U\xc6\x93\xceb*\x8d\x96\xeaw\xa1\xee\xf8\r\n> \xd1\xae\xd4\x89~\r\nSEND OK\r\n~003200\xb5\xe6\x946\xcc\n\x11\x9d\xf0]\xd7i\x88\xdc}\x02<\xf1\xd5\x03\xe8\x0b\x14\xdaM\xb8G\x03F\x93\xd6\x8fg\x00\xfa\x86\x04~'
for l in a.decode().split("\r"):
     tmp=re.search("~0", l.lstrip("\n"))
gets a match, but:
Code: Select all
tmp=re.search('~0(.*?)~', l.lstrip("\n"))
does not. It does not matter whether search or match is used in that case. The problems are the null bytes in the string.
You may change the loop into:
Code: Select all
for l in a.decode().split("\r"):
    ll = repr(l.lstrip("\n"))
    tmp=re.search('~0(.*?)~', ll)
    if tmp:
         print ("match", tmp.group(1))
Then it works, but with a changed search object.

thanks!

MicroPython Forum (Archive)

about re module

about re module

Re: about re module

Re: about re module

Re: about re module

Re: about re module

Re: about re module

Re: about re module