CSV module devleopment

Discussion about programs, libraries and tools that work with MicroPython. Mostly these are provided by a third party.
Target audience: All users and developers of MicroPython.
cr0mbly
Posts: 5
Joined: Sat Mar 21, 2020 10:26 pm

CSV module devleopment

Post by cr0mbly » Sat Mar 21, 2020 10:31 pm

Excuse my ignorance first, very interested in this project!

Given how the JSON module has been developed to optionally use a c implementation for a large section of it's implementation (swapping back to a python implementation if not available) e.g.

Code: Select all

try:
    from _json import make_scanner as c_make_scanner
except ImportError:
    c_make_scanner = None
I'd be keen to take a look at a dedicated python implementation for the CSV module https://github.com/micropython/micropyt ... master/csv which at the moment is just stubbed. Is there a reason against this (I'd assume performance) as the Cpython leverages heavily on the underlying C _csv library. I was just wondering if a first pass fully python version would be doable/welcome?

Again apologies for my ignorance I'm primarily a python developer.

User avatar
jimmo
Posts: 1700
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: CSV module devleopment

Post by jimmo » Sun Mar 22, 2020 3:27 am

Hi,

I was actually looking at the json module in micropython-lib a couple of days ago (as part of https://github.com/micropython/micropython-lib/pull/376). I'm a little bit confused why this module exists, as most ports have a built-in complete json module. Rather than just a helper (i.e. _json), they have the full json module. I don't even know where the _json module exists (it's certainly not part of core micropython). So as far as I can tell, the micropython-lib json module exists purely to provide a pure-python implementation for ports that don't have a built-in json.

Anyway, that's sort of beside the point... Yes, a pure-python CSV library would be useful, and a good contribution to micropython-lib. Thanks!

Some thought would need to go into how to make it compatible with CPython's csv module versus how to still make it micro, but I'm sure there's a minimal subset of the API that can be supported well.

User avatar
pythoncoder
Posts: 4255
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: CSV module devleopment

Post by pythoncoder » Sun Mar 22, 2020 8:59 am

It's >20 years since I had to decode CSV files but I do recall it being harder than it looks. Special cases are numerous.
Peter Hinch

stijn
Posts: 464
Joined: Thu Apr 24, 2014 9:13 am

Re: CSV module devleopment

Post by stijn » Sun Mar 22, 2020 9:19 am

Exactly. It's been unfortunately less than 1 year I had to work with CSV files and it's a terrible format to program with because it has no standard whatsoever, except something like "Stuff separated by this arbitrary separator which can be anything even though we call it 'comma-separated', but only if not quoted, means it's a single value. And oh yeah I can localize it any way I want but you still should try to pare it correctly." and there are probably even exceptions to that. I get why it became popular, and why it still is, but I'd be happy if it would just drop dead instantly :]
tldr; if you find an existing CSV module which passes all or most of CPython's CSV tests then it's definitely worth it. If you do it from scratch make sure you know what you're dealing with.

Could also be written in C with a Python wrapper over it: going to be faster, good chance of finding an existing implementation, just a bit harder to share with others because I don't think this would be made part of the MicroPython core and there isn't a de facto standard way of sharing custom C modules at the moment.

User avatar
pythoncoder
Posts: 4255
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: CSV module devleopment

Post by pythoncoder » Sun Mar 22, 2020 4:50 pm

Amen to that. Recalling my struggle still makes me shudder :? I ended up with something which processed data for indefinite periods but there was no guarantee that it was correct, no way to test it properly and no certainty that the data source wouldn't one day send something which would break it.

The traditional MicroPython approach might be to write a spec for a "micro" subset of csv and implement that. I doubt that has legs: the purpose of CSV is to be data interchange format. Usually you have no control over the data source. Aside from which, defining a subset of an ill-defined set isn't for the faint hearted.

For serialisation rather than data interchange then the answer is simple. Use a supported library: ujson, pickle, ustruct or protobuf.
Peter Hinch

User avatar
jimmo
Posts: 1700
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: CSV module devleopment

Post by jimmo » Mon Mar 23, 2020 12:21 am

I would echo everything that @pythoncoder and @stjin said -- I have similar bad experiences working with CSV.

However, one example scenario that I've seen many times in MicroPython-in-Education related work is that a student wants to write some sort of "data logger" device, then have an easy way to put that into a spreadsheet. Importantly, without having to write any sort of conversion app on their PC etc. Issues with USB MSC notwithstanding, being able to open a CSV file from the pyboard flash drive straight into Excel is a pretty "wow" workflow. Even without USB MSC, if a student understands how to get Python code onto the device, then they generally know how to get a file off the device. And teachers understand CSV too (and actively encourage this approach) (and the curriculum was designed 20+ years ago).

stijn
Posts: 464
Joined: Thu Apr 24, 2014 9:13 am

Re: CSV module devleopment

Post by stijn » Mon Mar 23, 2020 7:04 am

@cr0mbly maybe you can start with adding the capability to just write CVS files. Even a subset of the CPython API would be quite useful. And it is rather easy in comparison with reading.

User avatar
pythoncoder
Posts: 4255
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: CSV module devleopment

Post by pythoncoder » Mon Mar 23, 2020 7:09 am

Yes but writing CSV is so easy that surely coding it is part of the educational experience. It's reading it from an unknown source that is, er, interesting ;)
Peter Hinch

cr0mbly
Posts: 5
Joined: Sat Mar 21, 2020 10:26 pm

Re: CSV module devleopment

Post by cr0mbly » Fri Mar 27, 2020 6:26 pm

Wow thank you all for the discussions this is super helpful!

What I was thinking is taking the existing python/Cpython API interface and where possible just recreating it/porting it over to raw python rewriting it back into C down the line when the bottlenecks are found, I guess the interface is a good starting point.

I'll try to draw something up over the next week/few weeks and see where I get.

Thank you again

cr0mbly
Posts: 5
Joined: Sat Mar 21, 2020 10:26 pm

Re: CSV module devleopment

Post by cr0mbly » Fri Mar 27, 2020 8:04 pm

I've started working on this under this forked branch https://github.com/cr0mbly/micropython- ... csv_module

I've just added a example test written in the micro-python unittest implementation, is there a reason why the other modules don't utilise this?

I'm able to test my work pretty easily running

Code: Select all

make install MOD=csv
../micropython/ports/unix/./micropython csv/test_csv.py 
Looking at the unittest output it doesn't look as fully featured as base python unittest but seems to be doing the job for smoke testing while working on these libs.
Last edited by cr0mbly on Sun Mar 29, 2020 7:35 am, edited 1 time in total.

Post Reply