Launching a separate process instead of calling a memory intensive function

kristopher · Post by **kristopher** » Tue Mar 26, 2019 12:03 pm

Good day to everyone,

First post so please bear with me.

I am looking for a mechanism to "capture" the current state of the heap in order to have the ability to quasi-restore it to the point of capture. Usually what happens is that you have an input for a given function (for specifics lets say it is a processing intensive one), call it (it performs the processing) and return to the previous context.

If the function has the ability to clean after itself (properly without any future consequences dispose of the resources) then it is all fine and the program may continue as nothing happened. In MicroPython running on micro-controllers you get fragmentation over a longer period of time during which the program was running.

I am not trying to generally solve the problem of fragmentation but if I have the luxury of isolating one specific function that causes a lot of memory churn then I could potentially save the context before calling that function and later on attempt to restore the context in some way.

The immediate solution might be to use a separate process (isolated from the main.py) and after computation purge the memory allocated potentially postponing fragmentation. I've looked at uos in hopes of finding some sort of fork but without any luck. Also gc.collect() is not an option because the harm of multiple allocations would be already done.

I hope I was clear on what sort of problem I am tackling and that my question contributes towards benefiting the community.

stijn · Post by **stijn** » Tue Mar 26, 2019 3:58 pm

What platform do you use? If you don't run on top of an actual OS you can't really speak of another 'process'. But you could still do things like send data over the network to whatever other device you have, process over there, and get results back.

kristopher · Post by **kristopher** » Tue Mar 26, 2019 7:01 pm

I am using the second latest version of MicroPython for ESP8266. The file is esp8266-20180511-v1.9.4.bin.

I do agree that there is no operating system therefore no sense in having an operating system primitives like processes. But I would like to think that there would be a way to have an such capabilities like designating a fixed memory area for having code run in an isolated and completely disposable from the main.py script fashion (no need for it to be asynchronous though).

Found that there is _thread module but from the looks of it does have it's strain on the memory because the threads are not recycled afterwards and leave fragments.

Cheers

jickster · Post by **jickster** » Tue Mar 26, 2019 8:18 pm

In progress

Scoped allocation

https://github.com/micropython/micropython/issues/4081

pythoncoder · Post by **pythoncoder** » Wed Mar 27, 2019 9:15 am

If you put the memory intensive task into a function you should be able to issue gc.collect() from the calling code after return from the function. Any local references to allocated objects will now be out of scope and the memory will be reclaimed. Clearly you can break this with references to globals or bound variables, but if you keep everything local all should be well.

Code: Select all

def func():
    d = 1.234 # FP therefore allocates
    return math.sin(d)  # Allocates result

def run_it():
    s = func()
    gc.collect()  # d is out of scope and is reclaimed
    # s is a reference to result so is necessarily still on heap

kristopher · Post by **kristopher** » Wed Mar 27, 2019 7:00 pm

Thank you for your answer.

If it would only be that elegant - unfortunately the memory intensive function I have performs message packing and effectively is as effective as a string building which is difficult in a resource limited environment which is not if the format for the message is not trivial enough that it could be solved with a plain old format() function.

With the strings that I have that make up various parts of my message I have to first construct the format itself based on the starting conditions of the board. That is in my mind the biggest bottleneck and is proposed to be solved by precomputing the initial message format and storing it as a file for the duration of the power cycle. Afterwards at some point the message would be infilled and pushed to the internet. I would like to point out that I am not storing everything in memory and for efficiency reasons write to the socket in chunks.

The approach to precompute the format and store the result as a file would work if I could completely erase any traces of the previous computations but so it is that even after a call to gc.collect() the available memory does not increase which in my view is due to fragmentation (what else could it be when the function does not return anything nor uses any global variables). Not sure how could it be investigated further when there are no instruments to take a snapshot of the current heap and analyse the circumstances that make up the lack of available memory.

jickster · Post by **jickster** » Wed Mar 27, 2019 7:07 pm

Show us some pseudocode

kristopher · Post by **kristopher** » Wed Mar 27, 2019 8:08 pm

Code: Select all

DOUBLE_QUOTES = '%22'
SINGLE_QOUTES = '%27'
SEMICOLON = '%3A'
COMMA = '%2C'
CURLY_BRACKETS_OPEN = '%7B'
CURLY_BRACKETS_CLOSE = '%7D'
SQUARE_BRACKETS_OPEN = '%5B'
SQUARE_BRACKETS_CLOSE = '%5D'
SPACE = '+'

TIMESTAMP = 'timestamp'
SENSOR_ID = 'sensor_id'
VALUE = 'value'

NEWLINE = "\r\n"

HOST = 'www.webpage.com'
PAGE = 'index.php'
METHOD_HEADER = [ "POST /", PAGE, " HTTP/1.0", NEWLINE ]
HOST_HEADER = [ "Host: ", HOST, NEWLINE ]
CONTENT_TYPE_HEADER = [ "Content-type: application/x-www-form-urlencoded; charset=UTF-8", NEWLINE ]
CONTENT_LENGTH_HEADER = [ "Content-Length: " ]
PAYLOAD = 'payload='

def packAttributeValuePair(attribute, value):
	return [ SINGLE_QOUTES, attribute, SINGLE_QOUTES, SEMICOLON, value ]

def packAttributeValuePairs(pairs):
	return [ CURLY_BRACKETS_OPEN ] + reduce( lambda x, y: x + [ COMMA ] + packAttributeValuePair(*y), pairs, packAttributeValuePair(*pairs.pop(0))) + [ CURLY_BRACKETS_CLOSE ]

def packArray(array):
	return [ SQUARE_BRACKETS_OPEN ] + reduce( lambda x, y: x + [ COMMA ] + packAttributeValuePairs(y), array, packAttributeValuePairs(array.pop(0))) + [ SQUARE_BRACKETS_CLOSE ]

def packMessage(readings):
	payload = [ PAYLOAD ] + packArray(readings)
	length = sum(map(len, payload))
	return METHOD_HEADER + HOST_HEADER + CONTENT_TYPE_HEADER + CONTENT_LENGTH_HEADER + [str(length), NEWLINE, NEWLINE] + payload + [NEWLINE]

A few comments:
1) packMessage() is the function that I use to produce the format of the message. In my design it is meant to be run only once before entering the loop that attempts to read sensor values over time.
2) In this snippet packMessage() returns a list of strings but what I do is then write them to a file for later infilling.
3) The reason why I wrote this function in the first place is to merge HTTP message encoding and encoding of json files into a single go. I based it all on that how list operations are efficient in Python.

dhylands · Post by **dhylands** » Wed Mar 27, 2019 8:24 pm

kristopher wrote: ↑
Wed Mar 27, 2019 7:00 pm
The approach to precompute the format and store the result as a file would work if I could completely erase any traces of the previous computations but so it is that even after a call to gc.collect() the available memory does not increase which in my view is due to fragmentation (what else could it be when the function does not return anything nor uses any global variables). Not sure how could it be investigated further when there are no instruments to take a snapshot of the current heap and analyse the circumstances that make up the lack of available memory.

If the available memory does not increase after calling gc.collect then its because somebody is still holding onto the memory someplace. For example, if you create a string let's say x, and then call gc.collect the memory that x occupies won't be freed. If you however were to assign x to be None, then the big string would be freed. For example:

Code: Select all

>>> def foo():
...     ret = 'This is a really long string that will allocate some memory'
...     ret += 'Another long string to make sure that we use a string'
...     ret += 'and not a qstr'
...     return ret
... 
>>> def test():
...     gc.collect()
...     print('Starting free memory:', gc.mem_free())
...     x = foo()
...     gc.collect()
...     print('After calling foo:', gc.mem_free())
...     x = None
...     gc.collect()
...     print('After setting x to None', gc.mem_free())
... 
>>> test()
Starting free memory: 101072
After calling foo: 100928
After setting x to None 101072

Having fragmented memory is different from having free memory. Having fragmented memory readuces the size of the largest object you can allocate, it doesn't necessarily affect the amount of free memory that you have.

If you call a function and you have less memory when that function returns then its because that function somehow caused memory to be allocated which wasn't freed. This could be because a file or socket was opened and not closed, or because it affected a global or it's returning something.

stijn · Post by **stijn** » Thu Mar 28, 2019 9:12 am

kristopher wrote: ↑
Wed Mar 27, 2019 7:00 pm
even after a call to gc.collect() the available memory does not increase

As Dave says: if that truly is the case, you're somehow still holding references to memory allocated in that function.

As far as your pseudocode goes: depending on how 'pseudo' it is, there are probably other ways which are less memory-hungry/memory-fragmentation-hungry then constructing a lot of temproary lists.

MicroPython Forum (Archive)

Launching a separate process instead of calling a memory intensive function

Launching a separate process instead of calling a memory intensive function

Re: Launching a separate process instead of calling a memory intensive function

Re: Launching a separate process instead of calling a memory intensive function

Re: Launching a separate process instead of calling a memory intensive function

Re: Launching a separate process instead of calling a memory intensive function

Re: Launching a separate process instead of calling a memory intensive function

Re: Launching a separate process instead of calling a memory intensive function

Re: Launching a separate process instead of calling a memory intensive function

Re: Launching a separate process instead of calling a memory intensive function

Re: Launching a separate process instead of calling a memory intensive function