Unfortunately, the full JSON data structures received from Wunderground (45kB) or Openweathermap (14kB) which you can see at http://shrimping.it/tmp/weatherapi/ are both rather large and mostly redundant for the application, meaning JSON parsing directly isn't a realistic option. For this reason I would rather treat them as a stream, and process incoming chunks looking for matching substructures, and disposing of any stream contents which don't match.
It should of course be feasible to run a Regex-like state-machine over the stream directly, avoiding parsing anything but individual keypairs deep within the data structures, and doing that on a one-by-one basis, only preserving the information needed. For example in the OpenWeatherMap data, there are structures like this to show the rainfall in a 3 hour period...
Code: Select all
"rain":{"3h":0.16}
Code: Select all
"rain":{}
While I could knock up a loop which pulls these structures out, (a bit like the processCommand() function in https://github.com/ShrimpingIt/projects ... etTime.ino ), I would like to avoid authoring an impenetrable and api-specific tokenizing state-machine to flummox learners. Ideally the demo would be a reference example of good programming practice.
In normal circumstances I would be investigating the 'partial=True' flag of Pypi's regex module. The partial flag would enable me to author a Regular Expression which characterised a 'rain' substructure within the stream, and handle each byte as a potential continuation of a partially-matching string against that pattern until it gets a substructure which can be passed to the json module. Of course when the regex rejects the string (which it mostly would) then no string is progressively cached and matched, meaning you actually only perform substantial matching on...
Code: Select all
"
"r
"ra
"rai
"rain
"rain"
"rain":
"rain":{
Unfortunately the ure module is much more cut-down than PyPi Regex, meaning partial match is not available in the module.
My question is what would people consider good programming practice for this case?
Is there any generic state-machine strategy that I can expose learners to, and that they can hope to replicate for other APIs they encounter, or is the only way to put together my own mystifying tokenizer?