Debugging a memory problem

sameid · Post by **sameid** » Fri Apr 07, 2017 8:24 pm

I have a "memory leak" - why do I chose to call it a memory problem and not a memory leak in the title? Because I am not convinced of it yet...

I am running Micropython on Windows with varying sizes of GC heap.

My application implements an echo server using socket code (C and Py code) that I wrote myself.

Since I don't want to "indulge" you with the pleasure of reading massive amounts of new code - I will try to explain the problem I am having using the output experience:

After I run my echo server for some time (the remote client sends zero/random bytes) I am running out of memory.

Hypothesis 1:
Root pointers still point to objects I gradually allocate over time and eventually I run out of memory

Refute 1:
When I add a call to gc.collect at specific points in my code (like right after calling recv) - I never run out of memory - server keeps working - regardless of how small the heap is (even works with 50k).
This means that all my objects are being abandoned and eventually freed - in contrast to Hypothesis 1.

Hypothesis 2:
Fragmentation problem - when I don't call gc.collect all the time - the heap is getting very fragmented and I can not allocate new memory.

Refute 2:
When memory fails being allocated - I print the heap - after a call to gc.collect - there is no fragmentation - just a heap full of objects with no free space at all.

Before I go deeper:
I can print the entire memory of MP when run out of memory and start building a reference graph.
I'm not sure this is the right approach since I didn't even establish that it is a classic memory leak problem...
So I am here trying to get a better insight at memory management in MP.

Spooky action at a distance:
When I make small modifications to the code - for example - adding a global counter to the recv function in my Py code - suddenly everything works - without calling gc.collect - I don't run into memory problems - Paranormal activity?

Will appreciate the slightest help,
Sam

dhylands · Post by **dhylands** » Fri Apr 07, 2017 10:03 pm

We've had issues in the past where if you allocate a block and need say the first 8 bytes, but the last 8 bytes had a stale pointer from what was in the block previously, that this could then cause memory to not be freed when it should be.

But I'm pretty sure that was fixed.

To me the "Adding code and then things work" is very often indicative of memory and/or stack trampling (that's not python experience, but rather embedded C experience).

If you have your own custom objects (i.e. implemented in C) then its possible to have dangling pointers within those objects which would cause the GC to hold on to memory it shouldn't.

It's probably worth continuing down the path of doing the reference graph to understand why all of those blocks are being held. You should discover a dangling pointer someplace.

I've often thought it would be useful to have a function which given the address of the block, identifies what's keeping that block in the heap. It seems like it would be useful for debugging purposes.

stijn · Post by **stijn** » Sat Apr 08, 2017 7:56 am

When I add a call to gc.collect at specific points in my code (like right after calling recv) - I never run out of memory

If you don't add that call, is garbage collection ever called? It should be called automatically, see gc_alloc function in gc.c.

sameid · Post by **sameid** » Sat Apr 08, 2017 9:47 am

Dave,

It's hard for me to understand how come it's a dangling pointer issue - because when I do add some gc.collect code in specific places - I don't experience any leaks - meaning no dangling pointers at the moment I call gc.collect.

If it is a stack issue I will not be able to find the cause by printing the heap, am I right?

stijn,
Yes it is called automatically - but by the time it calls gc.collect - no free memory is available (not a fragmentation issue) - it only works if I add gc.collect manually in my code flow - doesn't work in other places though.

Sam

dhylands · Post by **dhylands** » Sat Apr 08, 2017 7:27 pm

If it was a pointer from the stack, then you would find some objects in the heap that you can't otherwise account for.

Like most nasty bugs, once you find the real cause, it all makes sense.

pfalcon · Post by **pfalcon** » Mon Apr 10, 2017 5:12 am

sameid wrote: I can print the entire memory of MP when run out of memory and start building a reference graph.

Yes, please do it in the completely generic manner, and contribute such a tool to the project, that would be very helpful to it.

Otherwise, the recommendation - instead of:

My application implements an echo server using socket code (C and Py code) that I wrote myself.

Since I don't want to "indulge" you with the pleasure of reading massive amounts of new code

- try to reproduce the issue with the echo example included with MicroPython. Any further questionable results, try to use standard OS like Linux (it should not matter, but then the issue you described should not happen, and if it does, challenge everything).

Spooky action at a distance:
When I make small modifications to the code - for example - adding a global counter to the recv function in my Py code - suddenly everything works - without calling gc.collect - I don't run into memory problems - Paranormal activity?

Not, it's the most mundane situation in any complex system - small changes lead to large differences, it's called Butterfly effect.

sameid · Post by **sameid** » Wed Jun 14, 2017 5:26 pm

I think your answer is not cool.
(Regarding viewtopic.php?t=3083#p19691)

There are many ways implementing a garbage collector.

Mark and sweep, pointer references, etc.

And even in mark and sweep there are many possible different implementations.

What I obviously meant by "not enough documentation" is the implementation details.

Of course I can read all the code, but unlike implementation details of simple data structures, the implementation of a garbage collector is a big deal!

Anyway to the matter, after creating a new function that prints all the current memory into a dict and then creating a graph of the memory we have found the reason for our leak.

1. We ported an external library to micropython by creating a module that calls its native functions and translates errors to exceptions.
2. We deferred all memory allocations of that external library to m_new
3. We run out of memory at some "random" cases

After using the graph we have realized that the reason for the MemoryError is the leak of many of the same object belonging to our external library - an object which should have at most 5 allocations at once (but is created repeatedly).

After some more debugging we realized that this object contains pointers to previously allocated object and the next object in line - but when the object is removed from the list its "prev" and "next" pointers are not zeroed - this implementation is in the external library. Since the object is allocated using m_new, these pointers are valid references - memory will not get freed.

But the thing is that we do leak ONE object, and this one object that we leak, holds a pointer to the next allocated object, which holds a next pointer and so on.

So we had the following options:
1. Edit the external lib, zero out pointers of removed objects, leak the first object.
2. Not use m_new, use libc malloc instead and now pointers are not references, leak the first object.
3. Not touch the implementation, fix the first leaked object which holds references to all others.

We went for option 3, since all other options are hacky and they do not solve the core problem.

So who references the first leaked object?
A root pointer - stack root pointer to be exact.

Where?

In vm.c - mp_execute_bytecode

The variable "top" several "mp_execute_bytecode" deep in the stack.

The thing is, that the variable "top" appears 5 times in this functions, and it's not the same "top" variable in the current "switch-case" iteration.

Which got us thinking - since the stack is root pointers, if we forget to clean a variable that points to an object on the heap, it will never be free, so if in the "for" in mp_execute_bytecode, we put an object in the "top" variable and then go for another run of the loop going in some other switch-case - that object will never be truly freed, it is stuck in the stack somewhere for good.

(Just for the info, when we changed the optimizations configurations we managed to "fix" this bug - which makes us think this is truly the case here)

So when I say I don't truly understand how the gc works - this is what I mean.

Sam

sameid · Post by **sameid** » Tue Jun 27, 2017 9:07 pm

Defining

Code: Select all

#define MICROPY_STACKLESS           (1)

Solves the problem, still don't know what the problem is though.
Still, I can't say what the problem is...

As I said before - the problem is that somewhere in the call to mp_execute_bytecode, the stack contains a pointer to the heap. I know the stack part is definitely in this function because when I debug it I see that it is on the stack between local variables of the mp_execute_bytecode function.

The thing is - the pointer points to a closure object that is not even allocated in the bytecode belonging to the python function being executed!

I might be going for a long shot here, but I use a lot of closures and generators with yields, I run it on Windows. Is there anything I should be worried about?

Thanks!

Post by **Damien** » Wed Jun 28, 2017 2:53 am

After some more debugging we realized that this object contains pointers to previously allocated object and the next object in line - but when the object is removed from the list its "prev" and "next" pointers are not zeroed - this implementation is in the external library. Since the object is allocated using m_new, these pointers are valid references - memory will not get freed.

But the thing is that we do leak ONE object, and this one object that we leak, holds a pointer to the next allocated object, which holds a next pointer and so on.

From what I understand in your description here the GC is doing the correct thing and retaining all the objects because they are all reachable from some root pointer. In particular, the first object is reachable because you need it. The next one is reachable because the first object points to it, and so on.

When you free the other objects (all but the first), do you use explicit m_free() calls, or are you just leaving it up to the GC to free them? If it's the latter then they won't ever be freed because there are pointers remaining.

Which got us thinking - since the stack is root pointers, if we forget to clean a variable that points to an object on the heap, it will never be free, so if in the "for" in mp_execute_bytecode, we put an object in the "top" variable and then go for another run of the loop going in some other switch-case - that object will never be truly freed, it is stuck in the stack somewhere for good.

Usually, if an object is on the Python stack (in the mp_execute_bytecode function) then it is in principle reachable from some Python code. Even if you don't explicitly reference the variable (in this case "top") it's still in principle reachable so must not be collected.

The thing is - the pointer points to a closure object that is not even allocated in the bytecode belonging to the python function being executed!

Closure objects are allocated outside the bytecode function, when setting up the function state for that function to be executed. Closure objects are used whenever your function closes over a variable (which is sometimes not obvious at first sight when looking at the code).

sameid · Post by **sameid** » Wed Jun 28, 2017 5:33 am

Damien wrote: From what I understand in your description here the GC is doing the correct thing and retaining all the objects because they are all reachable from some root pointer. In particular, the first object is reachable because you need it. The next one is reachable because the first object points to it, and so on.

When you free the other objects (all but the first), do you use explicit m_free() calls, or are you just leaving it up to the GC to free them? If it's the latter then they won't ever be freed because there are pointers remaining.

Yes, exactly!
I don't use explicit calls to m_free - I could fix the chained objects (by editing the external library code) and zero the next/prev pointers when they undergo destruction (they didn't do it because the external lib can not know that it will be run on micropython - where pointers are references).
Thing is, it will only be half a patch, because now I will only leak the first object on the stack (which is unused).
The real fix is to understand why I leak this "first" object (It's not actually first, sometimes the object can be 17, 56 or 1000), not leak it, and then when the GC will run on its own it will clean all of them since they are not pointed from anywhere.

Damien wrote: Usually, if an object is on the Python stack (in the mp_execute_bytecode function) then it is in principle reachable from some Python code. Even if you don't explicitly reference the variable (in this case "top") it's still in principle reachable so must not be collected.

The variable top is actually on the C stack (also it might be some other stack variable which is not "top" that shares the same stack address because of optimizations to variables inside the switch-case in the function mp_execute_bytecode, we've seen that sometimes the pointer is in some variable qst, but qst cannot point to an object of type closure - so there is some kind of variable union in the function).

Damien wrote: Closure objects are allocated outside the bytecode function, when setting up the function state for that function to be executed. Closure objects are used whenever your function closes over a variable (which is sometimes not obvious at first sight when looking at the code).

Yea, I perfectly understand that.
I'm just saying that I have a pointer on the stack, on a very old stack frame, this pointer is between C stack variables of the function mp_execute_bytecode.
It points to a closure that is not even present in the function being executed, in other words, this function doesn't hold a closure, it doesn't has a variable that might hold a closure.

And using stackless fixes the problem, somehow

MicroPython Forum (Archive)

Debugging a memory problem

Debugging a memory problem

Re: Debugging a memory problem

Re: Debugging a memory problem

Re: Debugging a memory problem

Re: Debugging a memory problem

Re: Debugging a memory problem

Re: Debugging a memory problem

Re: Debugging a memory problem

Re: Debugging a memory problem

Re: Debugging a memory problem