str/bytes splitting without heap reallocation

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
kevinkk525
Posts: 969
Joined: Sat Feb 03, 2018 7:02 pm

str/bytes splitting without heap reallocation

Post by kevinkk525 » Tue Dec 11, 2018 12:46 pm

Is it possible to split a string or bytes object into 2 smaller objects without them being reallocated on the heap? Basically splitting them in place into separate objects with the original one not being available anymore?

The use-case for this is that I receive a very long string but the first part of it is a header and the rest is a json object. I don't want to extend the json object with the header as this would modify an object that the network layer does not "own". But separating these two objects now would result in them being allocated on the heap again needing a lot of heap space. I know I could work around this problem by reading the header from the socket first and then the rest but with the underlying protocol it is easier and safer to send one message as a single object.

The other option would be to pass a memoryview of only the json object to ujson.loads but memoryview is not supported in this case.

I found a lot of possible functions for this but as I'm lacking some insight into the internal processes of micropython on the c-level, I don't know if there is a possibility that does not need heap reallocation for this to work.

Another approach would be to use the ujson.load method that accepts a stream but it does not work with an uasyncio.StreamReader object.
Kevin Köck
Micropython Smarthome Firmware (with Home-Assistant integration): https://github.com/kevinkk525/pysmartnode

jickster
Posts: 629
Joined: Thu Sep 07, 2017 8:57 pm

Re: str/bytes splitting without heap reallocation

Post by jickster » Thu Dec 13, 2018 5:33 am

kevinkk525 wrote:I don't want to extend the json object with the header as this would modify an object that the network layer does not "own".
What does this mean?


Sent from my iPhone using Tapatalk Pro

kevinkk525
Posts: 969
Joined: Sat Feb 03, 2018 7:02 pm

Re: str/bytes splitting without heap reallocation

Post by kevinkk525 » Thu Dec 13, 2018 7:51 am

jickster wrote:
Thu Dec 13, 2018 5:33 am
kevinkk525 wrote:I don't want to extend the json object with the header as this would modify an object that the network layer does not "own".
What does this mean?
It just means that I don't want the network layer to "hack" into the message an application sends. For example if an application sends a dictionary, I don't want the header to be put into that dictionary. I could of course wrap around that dictionary with another dictionary or list containing the header but that requires more RAM. The sending direction is could convert the message to json and prepend the header to the resulting string. This probably causes RAM realloaction as well but as the messages that a microcontroller sends are usually short, this should not be a problem. I'm more concerned about receiving larger messages because when splitting such a large string, it will have to be reallocated, needing twice the RAM as then 3 strings are available: the original message, the header, the app message (original message minus header). Therefore the question if there is an option to do this without heap reallocation, e.g. splitting a string in place so that the original string does not exist anymore.
Kevin Köck
Micropython Smarthome Firmware (with Home-Assistant integration): https://github.com/kevinkk525/pysmartnode

stijn
Posts: 735
Joined: Thu Apr 24, 2014 9:13 am

Re: str/bytes splitting without heap reallocation

Post by stijn » Thu Dec 13, 2018 8:33 am

I didn't check this in code, but I'm fairly sure the principle in uPy is like in (most) other languages: if you just split the memory for a string object it's not a string object anymore, because it requires some kind of object header/identification (in case of uPy: either interned string which is identified by some id byte, or if not interned because it's a struct with a base pointer etc) and/or end (in case of null-terminated strings), and that is not present halfway the string.

The canonical way of what you are doing is working with 'bare' memory, i.e. reserve space for your header, then dump payload after it. Which would need some extra machinery to make this work with json as you figured.

kevinkk525
Posts: 969
Joined: Sat Feb 03, 2018 7:02 pm

Re: str/bytes splitting without heap reallocation

Post by kevinkk525 » Thu Dec 13, 2018 8:37 am

That's what I thought. Would already help if only part of the string gets reallocated instead of everything or if the original string gets removed in the process so that there is no temporary need for 2 times the RAM of the string.
Other option would be a memoryview like access to a string or bytes object but that does not seem to exist and saving the string to a bytearray will consume 2 times the RAM again and ujson can't read from a bytearray/memoryview.
Kevin Köck
Micropython Smarthome Firmware (with Home-Assistant integration): https://github.com/kevinkk525/pysmartnode

stijn
Posts: 735
Joined: Thu Apr 24, 2014 9:13 am

Re: str/bytes splitting without heap reallocation

Post by stijn » Thu Dec 13, 2018 8:53 am

ujson needs a stream-like object, right? I'm not on a dev machine now but IIRC uPy has stringstream-like objects, would that be a solution? Or in any case there should be some way to adapt memoryview/bytearray to the interface ujson wants because in essence it's a match already.

kevinkk525
Posts: 969
Joined: Sat Feb 03, 2018 7:02 pm

Re: str/bytes splitting without heap reallocation

Post by kevinkk525 » Thu Dec 13, 2018 8:56 am

what would be a stringstream-like object?
ujson could probably be used with a normal stream but not with uasyncio StreamReader as far as I know. But as this network layer receives a string object, it's not an option anyway.
I guess that could be done but I lack the knowledge and implementing things in C into the firmware was not my goal actually. I was merely looking for the best solution from the available set of tools (but if I can easily implement it in micropython that would ok too).
Kevin Köck
Micropython Smarthome Firmware (with Home-Assistant integration): https://github.com/kevinkk525/pysmartnode

stijn
Posts: 735
Joined: Thu Apr 24, 2014 9:13 am

Re: str/bytes splitting without heap reallocation

Post by stijn » Thu Dec 13, 2018 5:28 pm

kevinkk525 wrote:
Thu Dec 13, 2018 8:56 am
what would be a stringstream-like object?
Sorry, was thinking in C++.

I mean something like this:

Code: Select all

>>> import uio
>>> import ujson
>>> stream = uio.StringIO()
>>> stream.write('myheader')
8
>>> ujson.dump({'a': 1, 'b': 2}, stream)
>>> stream.getvalue()
'myheader{"a": 1, "b": 2}'

kevinkk525
Posts: 969
Joined: Sat Feb 03, 2018 7:02 pm

[solved] Re: str/bytes splitting without heap reallocation

Post by kevinkk525 » Thu Dec 13, 2018 5:58 pm

Thanks for the clarification and the example!

More interesting than the sending direction the other way, receiving the string 'myheader{"a": 1, "b": 2}' and converting it into header and dict:

Code: Select all

>>> c='header{"hi": "ho", "hoho": "hi"}'
>>> s=uio.StringIO(c)
>>> s.read(6)
'header'
>>> json.load(s)
{'hi': 'ho', 'hoho': 'hihi'}
The question is, is this a RAM efficient method?
The original string is still existing but it seems that I'm getting rid of one allocation step as json reads the dictionary directly from the stream. Therefore only the header part exists twice in RAM. That overhead is not worth mentioning.
Also the stream does not copy the string so that's great.
Thanks a lot, I think this solves my problem.
Kevin Köck
Micropython Smarthome Firmware (with Home-Assistant integration): https://github.com/kevinkk525/pysmartnode

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: str/bytes splitting without heap reallocation

Post by pythoncoder » Fri Dec 14, 2018 6:41 am

When you issue

Code: Select all

s.read(header_length)
does an allocation occur for the header? Clearly it would have if you'd assigned the result to a variable, but in this case we've discarded it. So, either an allocation has occurred which GC will reclaim, or no allocation occurred.

It looks like allocation does occur:

Code: Select all

import pyb, uio, utime, micropython
micropython.alloc_emergency_exception_buf(100)

c = 'header{"hi": "ho", "hoho": "hi"}'
s = uio.StringIO(c)

def cb(t):
    s.read(6)

t = pyb.Timer(1)
t.init(freq = 1, callback = cb)
utime.sleep(1.5)
print(s)
Outcome:

Code: Select all

>>> uncaught exception in Timer(1) interrupt handler
Traceback (most recent call last):
  File "<stdin>", line 6, in cb
MemoryError: memory allocation failed, heap is locked

>>> 
Peter Hinch
Index to my micropython libraries.

Post Reply