Re: A review of serialisation libraries
Posted: Thu Feb 13, 2020 6:55 pm
It's also going to depend on the type of data. So this got me interested and I quickly compared it for Python for data we typically use, and for Protobuf vs MessagePack (the pure Python implementation and not the binary one, to make it fair) the results here are MessagePack being a tiny bit faster than Protobuf. But Protobuf also gives you type checking etc. And both are completely destroyed by json, being a native implementation, which does the trick in like 1 second. Or possibly I messed up what I'm doing, it's late already and I'm not sure the comparisions are fair. Code is like:
mydata.proto
test.py
results in
tldr; same old story: first check your requirements, then test a couple of implementations for your specific usecase, then choose.
mydata.proto
Code: Select all
syntax = "proto2";
package tutorial;
message Data {
required string id = 1;
required string type = 2;
repeated float values1 = 3;
repeated int32 values2 = 4;
}
Code: Select all
import os
os.environ['MSGPACK_PUREPYTHON'] = '1'
import mydata_pb2
import msgpack
import cProfile
import json
# Replicate class we use with protobuf.
class Data:
def __init__(self, **kwargs):
self.id = kwargs.get('id', '')
self.type = kwargs.get('type', '')
self.values1 = kwargs.get('values1', [])
self.values2 = kwargs.get('values2', [])
def PopulateData(d):
d.id = 'foo'
d.type = 'bar'
d.values1.extend([float(x) for x in range(1000)])
d.values2.extend([x for x in range(1000)])
return d
rawData = PopulateData(Data())
protoData = PopulateData(mydata_pb2.Data())
iters = 1000
def RunProtoBuf():
for _ in range(iters):
mydata_pb2.Data().ParseFromString(protoData.SerializeToString())
def RunMessagePack():
for _ in range(iters):
Data(**msgpack.unpackb(msgpack.packb(rawData.__dict__), raw=False))
def RunJSon():
for _ in range(iters):
Data(**json.loads(json.dumps(rawData.__dict__)))
cProfile.run('RunProtoBuf()')
cProfile.run('RunMessagePack()')
cProfile.run('RunJSon()')
Code: Select all
32723695 function calls (32723688 primitive calls) in 16.911 seconds
29686004 function calls (25670004 primitive calls) in 14.861 seconds
21004 function calls in 0.806 seconds