Possible URE non-conformance with micropython-re-pcre breaks JSON parser

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
Post Reply
cefn
Posts: 230
Joined: Tue Aug 09, 2016 10:58 am

Possible URE non-conformance with micropython-re-pcre breaks JSON parser

Post by cefn » Thu Jul 06, 2017 9:14 am

I think ijson could be very useful for learners processing JSON from web APIs within a low-memory environment. I have gone a long way towards porting the ijson module to micropython, as discussed at viewtopic.php?f=2&t=3534.

However at the last step I have hit an issue related to possible non-conformance of ure with re.

As far as I can see, I'm not using any re syntax features which ure doesn't support. In any case I believe that ure should either break (tell me I am using unsupported re syntax) or conform (match the behaviour of micropython-re-pcre).

I have done what I can to paper over the cracks from known variations of the api and re support (e.g. omitting character classes, reimplementing .start() and .end()). However, there must be some incorrect feature I am still using.

The only patterns used in the pure-python JSON parsing backend (ijson/backends/python.py) are as below.

Code: Select all

LEXEME_RE = re.compile('[a-z0-9eE\\.\\+-]+|[^ \t\r\n\f]')
STRINGCHUNK = re.compile('(.*?)(["\\\\\\x00-\\x1f])')
I wonder if some others on the forum have more understanding of the range of supported features between ure vs micropython-re-pcre and can pinpoint my error so I can fix it.

For reference, you can find the project at https://github.com/ShrimpingIt/micropython-ijson and the snapshot I am discussing which has a ure issue is at https://github.com/ShrimpingIt/micropyt ... 3cb9bbe283

The test case I am using is functional in unix micropython 1.8.7 with micropython-re-pcre but non-functional with ure. The bug can be demonstrated by substituting import ure as re for import re in the pure python backend https://github.com/ShrimpingIt/micropyt ... /python.py

A successful (or unsuccessful) run can be demonstrated using the test function print_local() in https://github.com/ShrimpingIt/micropyt ... 83/main.py

With ure, the test parse fails immediately on the first structure in the test JSON file https://github.com/ShrimpingIt/micropyt ... ermap.json

** UPDATE **

I have simplified the patterns even further, but still facing parsing errors with ure while micropython-re-pcre works as expected...

Code: Select all

LEXEME_RE = re.compile('[a-z0-9eE\\.\\+-]+|[^ \t\r\n\f]')
STRINGCHUNK = re.compile('(.*?)(")') 
The error I hit is fairly arbitrary (a by-product of not lexing the JSON correctly) but presents as follows in a REPL session, indicating that it is trying to process the key name 'cod' as if it's a number, while running the same routine on unix micropython with pcre works as expected...

Code: Select all

>>> import main
>>> main.print_local()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "main.py", line 43, in print_local
  File "main.py", line 43, in print_local
  File "main.py", line 29, in print_forecast
  File "ijson/common.py", line 63, in parse
  File "ijson/backends/python.py", line 262, in basic_parse
  File "ijson/backends/python.py", line 202, in parse_value
  File "ijson/backends/python.py", line 200, in parse_value
  File "ijson/common.py", line 156, in number
ValueError: invalid syntax for integer with base 10: 'c'

User avatar
deshipu
Posts: 1388
Joined: Thu May 28, 2015 5:54 pm

Re: Possible URE non-conformance with micropython-re-pcre breaks JSON parser

Post by deshipu » Thu Jul 06, 2017 11:49 am

Do you think you could come up with the minimal pattern and string examples on which ure gives a different output?

cefn
Posts: 230
Joined: Tue Aug 09, 2016 10:58 am

Re: Possible URE non-conformance with micropython-re-pcre breaks JSON parser

Post by cefn » Thu Jul 06, 2017 12:22 pm

Here's a variation between ure and re for starters, but I don't know how representative it is of the underlying parse error, it was just a speculative choice of a text match...

Code: Select all

import re
import ure
pattern = '[a-z0-9eE\\.\\+-]+|[^ \t\r\n\f]'
text = '{'
ureRegex = ure.compile(pattern)
reRegex = re.compile(pattern)
ureMatch = ureRegex.match(text)
reMatch = reRegex.match(text)
print("ure and re are" + (" " if (ureMatch and reMatch) and (ureMatch.group(0) == reMatch.group(0)) else " not ") + "equivalent" )
Running this in micropython v1.9.1-142-ged52955 reports 'ure and re are not equivalent'

For reference, the ure result also contradicts the result from cpython re as well.

cefn
Posts: 230
Joined: Tue Aug 09, 2016 10:58 am

Re: Possible URE non-conformance with micropython-re-pcre breaks JSON parser

Post by cefn » Thu Jul 06, 2017 1:00 pm

Narrowing in, perhaps.

Code: Select all

>>> ure.match("[\\.-]+|[^ ]", "}") is None
True
>>> ure.match("[-]+|[^ ]", "}") is None
False
If the character list contains \\.

[double backslash followed by period]

Then ure.match seems unable to continue on until the |

[alternating bar]

User avatar
deshipu
Posts: 1388
Joined: Thu May 28, 2015 5:54 pm

Re: Possible URE non-conformance with micropython-re-pcre breaks JSON parser

Post by deshipu » Thu Jul 06, 2017 9:08 pm

You can try this for a workaround:

Code: Select all

>>> ure.match("[-\\.]+|[^ ]", "}") is None
False
but it definitely looks like a bug in how "-" is handled inside "[]".

pfalcon
Posts: 1155
Joined: Fri Feb 28, 2014 2:05 pm

Re: Possible URE non-conformance with micropython-re-pcre breaks JSON parser

Post by pfalcon » Sat Jul 08, 2017 11:57 am

Awesome MicroPython list
Pycopy - A better MicroPython https://github.com/pfalcon/micropython
MicroPython standard library for all ports and forks - https://github.com/pfalcon/micropython-lib
More up to date docs - http://pycopy.readthedocs.io/

Post Reply