Memory allocation and deallocation

C programming, build, interpreter/VM.
Target audience: MicroPython Developers.
sameid
Posts: 11
Joined: Mon Feb 27, 2017 7:30 am

Memory allocation and deallocation

Post by sameid » Mon Feb 27, 2017 11:43 am

First of all I would like to say that I find your work amazing. Bringing the Python language to this standard is really beautiful - I can finally use its embeddable properties as a replacement for Lua - with better syntax (generators look way better) and better language support (exceptions are just great).

I started by developing a small module in c as an extension and I still haven't figured out how the gc finds "abandoned" memory.

I've read the wiki page regarding the gc and still it's a mystery for me.

Question 1.
If I create my own type with a "new" function.
In that function I allocate, using m_new some memory, and return it from the function. Now I understand that now the object is referenced by the python vm - and when references no longer exist - the gc will be able to clean that piece of memory.
But assuming my object has a pointer field to an array of ints I want to allocate dynamically using the same python heap in my "new" function - how does the gc knows to connect between the outer object seen by the vm and my inner implementation?
At first I thought I should implement a specific __del__ function for my type - but __del__ doesn't even get called!

Question 2.
What happens if an exception is thrown in my c code using nlr magic after I allocated some memory using m_new but haven't returned it yet?

Question 3.
Connected to question 2 - in objstr.c we have:

STATIC mp_obj_t str_encode(size_t n_args, const mp_obj_t *args) {
mp_obj_t new_args[2];
if (n_args == 1) {
new_args[0] = args[0];
new_args[1] = MP_OBJ_NEW_QSTR(MP_QSTR_utf_hyphen_8);
args = new_args;
n_args++;
}
return bytes_make_new(NULL, n_args, 0, args);
}

1. New qstr - heap allocation?
2. New bytes - heap allocation?

What happens if second one fail? nlr magic? What happens to memory allocated first?

(Nothing special about objstr.c just tried to find two allocations without checks in between)

Thanks!
Sam

User avatar
dhylands
Posts: 3821
Joined: Mon Jan 06, 2014 6:08 pm
Location: Peachland, BC, Canada
Contact:

Re: Memory allocation and deallocation

Post by dhylands » Mon Feb 27, 2017 4:31 pm

The way that the GC (garbage collector) works is that any object which has a pointer to it is kept.

MicroPython has the notion of root pointers. These are maintained in the mp_state_t struct:
https://github.com/micropython/micropyt ... #L100-L169

Any object which is pointed to through a root pointer, or another object, or a pointer on the stack or from one of the registers will be kept. Other objects will be freed.

So if you allocate something and then throw an exception, the first allocated object will generally wind up being freed (at garbage collection time), unless a pointer to it was ultimately stored in a root pointer or other python object.

QSTR's are otherwise known as "interned" strings. These are referenced by index and not by pointer, which allows some performance improvements. Once a QSTR is allocated it never gets freed. There is a pool of them. QSTRs are also only allocated once. If a QSTR already exists, then the index of it will be returned rather than allocating a new one.

sameid
Posts: 11
Joined: Mon Feb 27, 2017 7:30 am

Re: Memory allocation and deallocation

Post by sameid » Tue Feb 28, 2017 6:46 am

Thanks for the explanation, but I still haven't gotten my answer.

If I return my own object, from a type I created using m_new on 10 bytes from a c function to the vm. But in my own object the first 4 bytes (32 bit machine) is an internal pointer in the struct that points to another memory, let's say 4 bytes that were allocated by m_new too - to hold a dynamic int.

When my own object gets GC'd - how can the GC possibly know about my 4 internally allocated bytes, shouldn't it leak? Isn't a some kind of a destructor needed for my object?

Thanks,
Sam

User avatar
dhylands
Posts: 3821
Joined: Mon Jan 06, 2014 6:08 pm
Location: Peachland, BC, Canada
Contact:

Re: Memory allocation and deallocation

Post by dhylands » Tue Feb 28, 2017 7:21 am

The GC will examine all of the objects and follow any pointers contained within them.

For example, here's a C strcuture for the UART:https://github.com/micropython/micropyt ... .c#L78-L92 which contains a pointer to read buffer (line 91).

A side effect of this is that the GC will scan the UART object and see the pointer and thus not free it. The way the gc works, if the object contains a bit pattern which looks like a heap pointer, it might accidentally not free an object which should otherwise be freed.

When the UART object gets GC'd, then there will no longer be a pointer to the read_buf (assuming that there isn't one someplace else), so it will in turn be GC'd.

sameid
Posts: 11
Joined: Mon Feb 27, 2017 7:30 am

Re: Memory allocation and deallocation

Post by sameid » Tue Feb 28, 2017 7:50 am

Interesting, so let's see if I got it right:

1. If the heap contains a pointer to the heap - the GC will treat it as a referenced memory. (It must be aligned?)

2. The GC has to scan all bytes in the heap to clean internally referenced memory

3. If, god forbid, my internal pointer will not be aligned - the memory it points too will eventually be GC'd? Making it point to garbage.

4. The pattern is something like pointer size in heap range with a certain suffix? (Depending on block size?)

5. If I have a large array of ints on the heap which never gets freed and by 'accident' these ints are following the pattern of the heap - memory will not be GC'd?

6. Finally, if objects have internal state - the correct way to use them is with a ".close()" function like in sockets

Thanks,
Sam

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Memory allocation and deallocation

Post by pythoncoder » Tue Feb 28, 2017 12:22 pm

Small integers are constrained such that b31==b30. This enables uPython to reliably distinguish between a small integer and a pointer which is guaranteed not to follow this pattern. This is possible because pointers are word aligned, so have two 'spare' bits. It's binary blobs like bytes instances which can (rarely) be mistaken for pointers and needlessly retained.

Pointers created by uPython will always be word aligned. I guess you could create a non-aligned pointer in C or inline assembler but making this visible to uPython is probably a bad idea.

If an object has internal state its RAM will be reclaimed when the last pointer to the object goes out of scope. There is usually no need to code explicit closure.
Peter Hinch
Index to my micropython libraries.

sameid
Posts: 11
Joined: Mon Feb 27, 2017 7:30 am

Re: Memory allocation and deallocation

Post by sameid » Tue Feb 28, 2017 3:43 pm

I see,

It's a shame that there is no clear/thouroughful documentation regarding this matter - feels like it's one of the basics in case one wants to contribute.

In any case, big thanks for the explanations!

Sam

pfalcon
Posts: 1155
Joined: Fri Feb 28, 2014 2:05 pm

Re: Memory allocation and deallocation

Post by pfalcon » Sat May 13, 2017 11:46 pm

It's a shame that there is no clear/thouroughful documentation regarding this matter
There's clear and thorough documentation, it's in the usual places - Wikipedia, Google, source code. Google gives about 2 million hits for "garbage collection memory". Sad conclusion one may come to is that if those 2 million pages didn't help, 2 million first won't help either :-(.
Awesome MicroPython list
Pycopy - A better MicroPython https://github.com/pfalcon/micropython
MicroPython standard library for all ports and forks - https://github.com/pfalcon/micropython-lib
More up to date docs - http://pycopy.readthedocs.io/

User avatar
elliotwoods
Posts: 23
Joined: Wed Dec 04, 2019 8:11 am

Re: Memory allocation and deallocation

Post by elliotwoods » Mon Jul 13, 2020 11:24 am

Oh this is really interesting.
I'm having issues here with my allocated memory becoming invalid after garbage collection.
I have code classes in C++ being exported across to C

I have read up on garbage collection before, but I'm unaware of the strategy of scanning the heap for pointers.
It seems strange that such a strategy would be safe at all (I presume it is, but I'm just not getting something)
e.g. if you have an array of uint32_t's
then ok.. maybe they are not aligned the same way as pointers, but the MSBs from one element with the LSBs of the next element will mix to look like a valid pointer.
How can such a situation be avoided?
This maybe isn't my issue, I'm just curious :)


My actual issue is that objects within my C++ code are being invalidated by the garbage collector
Currently investigating but I might reach out for help.

User avatar
elliotwoods
Posts: 23
Joined: Wed Dec 04, 2019 8:11 am

Re: Memory allocation and deallocation

Post by elliotwoods » Mon Jul 13, 2020 12:01 pm

Ok here's where I'm at:

tensorflow asks for a chunk of memory that it can use:

Code: Select all

const int tensor_arena_size = 2 * 1024;
uint8_t tensor_arena[tensor_arena_size];
It then uses this memory to store everything (both the model and the variables)
On garbage collection, this pointer is generally invalidated.

So now I'm using:

Code: Select all

this->heapArea = m_malloc(tensorArenaSize + alignment);
this->tensorArena = (uint8_t*) MP_ALIGN(this->heapArea, alignment);
And I also copy that pointer into the struct which represents the model in the micropython module.
Now this memory is not invalidated as a whole, but some parts of the memory inside are now corrupted.
I presume this is because there are pointers inside that block of allocated memory which get either moved or deleted by the garbage collector.

Is it possible for the garbage collector to simply ignore a section of memory altogether?

thank you

Post Reply