Page 1 of 2

A review of serialisation libraries

Posted: Mon Feb 10, 2020 6:43 pm
by pythoncoder
I have written a doc in this repo describing the need for serialisation and summarising the relative advantages and drawbacks of four serialisation libraries available to MicroPython. These are:
  • ujson
  • pickle
  • ustruct
  • Protocol Buffers (a third party library)
There is a tutorial on Protocol Buffers. This library has a slightly challenging learning curve but offers unique advantages.

Any comments or corrections are welcome, here or on GitHub.

Re: A review of serialisation libraries

Posted: Tue Feb 11, 2020 5:35 pm
by dhylands
I've also recently been looking into flat buffers which offer some advantages over protocol buffers.
https://google.github.io/flatbuffers/

Admittedly, I've been using flatbuffers in rust, and not in python, but I wouldn't really expect the programming language to make too much difference.

Re: A review of serialisation libraries

Posted: Tue Feb 11, 2020 7:07 pm
by stijn
Another one for consideration would be MessagePack. Didn't compare it with other implementations performance-wise, but we've been using this for a while to interface with external code and haven't found problems so far. Reasons for use for our usecase: it's straightforward, has implementations in other code (been using it for interfacing with CPython and C++ and Lua). Why not json? Mainly because the standard doesn't have nan/inf: that's really problematic for numerical data. Why not protobuf? Quite involved to set up and maintain and essentially overkill.

Re: A review of serialisation libraries

Posted: Wed Feb 12, 2020 2:23 am
by tve
There's a huge performance difference between self-describing formats, such as json or messagepack and pre-defined formats, such as protobufs because optimized code can be generated for the latter while the former require some form of interpretation no matter what. However, when using a slow/dynamic language, such as python, it really doesn't make much of a difference...

Re: A review of serialisation libraries

Posted: Wed Feb 12, 2020 2:38 am
by mattyt
Looks good Peter, thanks!

But I tend to use FlatBuffers and MessagePack (and JSON/TOML for human-readable) depending on the use case. MessagePack when I want to allow dynamic messages to be constructed, FlatBuffers when implementing a protocol. At some point I'd like to see support for both of these formats in MicroPython...

Capt'n Proto and Protobuf are two others worth mentioning but have generally moved away from in favour of FlatBuffers.

Re: A review of serialisation libraries

Posted: Wed Feb 12, 2020 8:31 am
by pythoncoder
tve wrote:
Wed Feb 12, 2020 2:23 am
There's a huge performance difference between self-describing formats, such as json or messagepack and pre-defined formats, such as protobufs because optimized code can be generated for the latter while the former require some form of interpretation no matter what. However, when using a slow/dynamic language, such as python, it really doesn't make much of a difference...
That depends on what you're doing with the data. For example if you're sending it over a radio link data volume can be crucial. Some channels such as LoraWan have bandwidth restrictions, others are just slow.

Re: A review of serialisation libraries

Posted: Wed Feb 12, 2020 8:36 am
by pythoncoder
I wrote this because I have used all three official solutions but was intrigued by the properties of Protocol Buffers. In my testing the library works well. If there are good MicroPython implementations of any of these other schemes I'd be interested to see them.

Even better would be PR's to enhance my doc to describe their benefits and usage ;)

Re: A review of serialisation libraries

Posted: Wed Feb 12, 2020 8:51 am
by stijn
tve wrote:
Wed Feb 12, 2020 2:23 am
There's a huge performance difference between self-describing formats, such as json or messagepack and pre-defined formats, such as protobufs
I find that pretty hard to believe. At least in the case of MessagePack. And also because it's a super general non-qualified claim. I searched around a bit and also didn't find anything supporting it, sometimes even the opposite.

Re: A review of serialisation libraries

Posted: Wed Feb 12, 2020 9:04 am
by stijn
pythoncoder wrote:
Wed Feb 12, 2020 8:36 am
If there are good MicroPython implementations of any of these other schemes I'd be interested to see them.
You mean implementations of MessagePack etc?
You can pip install msgpack and add the module to MicroPython's search path. Modify exceptions.py so that it doesn't use multiple inheritance and then it works.
There's also u-msgpack-python but I do not remember the modifications required to make it work on MicroPython. It was twice as slow as msgpack-python for our use case tough.

Re: A review of serialisation libraries

Posted: Thu Feb 13, 2020 5:02 pm
by tve
stijn wrote:
Wed Feb 12, 2020 8:51 am
tve wrote:
Wed Feb 12, 2020 2:23 am
There's a huge performance difference between self-describing formats, such as json or messagepack and pre-defined formats, such as protobufs
I find that pretty hard to believe. At least in the case of MessagePack. And also because it's a super general non-qualified claim. I searched around a bit and also didn't find anything supporting it, sometimes even the opposite.
Fair enough. See https://github.com/alecthomas/go_serial ... benchmarks and for the results the tables are easiest to read in the raw doc: https://raw.githubusercontent.com/alect ... /README.md The time/iter column shows the time taken to serialize a struct with random data.

Some examples: std messagepack 1958 ns/iter while a special pre-compiled version of messagepack: 420 ns/iter. Protobuf: 319 ns/iter. Json: 4892 ns/iter. A lot of the performance difference comes from dynamic type dispatch, either to figure out what needs to be serialized or to figure out what needs to be allocated to deserialize. These benchmarks are obviously somewhat specific to Golang but the principle that interpreting types (whether using language introspection or some explicit type specifier as used in python's struct) is always going to have a penalty. Now, in python (or javascript or ruby) the difference may be insignificant due to the overall overhead of the language.