Page 1 of 2

Heap fragmentation

Posted: Wed Dec 23, 2015 6:39 am
by pythoncoder
I strongly suspect I'm experiencing this. My query comprises two parts - firstly a request for general advice on the subject, and secondly for a view on whether my diagnosis is likely to be correct. The general questions:
  • It would seem best to instantiate large objects at the start when the heap is clean, as globals.
  • My impression is that assigning variables as globals reduces the issue. Is this valid? When is RAM for globals allocated?
  • Any other design pointers?
  • Is heap fragmentation a potential reliability issue in continuously running programs?
  • How can you assess the margin of safety for a given program?
The specifics. My program pulls in a number of large libraries. It instantiates objects needing buffers in global namespace, acquires data from radio links, performs some statistics, displays the results, and goes into standby. Accordingly the memory starts off with a clean slate each time it is woken from standby. I'm finding that small changes to the code result in memory allocation failures. If I precede the failing line with

Code: Select all

gc.collect()
micropython.mem_info()
It reports about 33KB of free memory. The stack also looks good. If I run pyb.info(1) I get a very long and busy looking listing. Here is an example of the apparently random behaviour I'm encountering. The following function was failing at the penultimate line (seqvals is a set of up to 24 integers). The failure continued if I replaced the set based solution with an iterative one.

Code: Select all

def remdata(remote):
# code omitted
    missed = len({p for p in range(min(seqvals), max(seqvals)+1)}-seqvals) # Allocation fail despite 33K available
    return {'min_t1': min_t1, 'max_t1': max_t1, 'min_t2': min_t2, 'max_t2': max_t2, 'max_nfail': max_nfail, 'max_seqfail': max_seqfail, 'missed': missed, 'min_vdd': min_vdd}
Working version: declaring the dictionary as global fixed the problem.

Code: Select all

def remdata(remote):
    global hist
# code omitted
    missed = len({p for p in range(min(seqvals), max(seqvals)+1)}-seqvals)
    hist ={'min_t1': min_t1, 'max_t1': max_t1, 'min_t2': min_t2, 'max_t2': max_t2, 'max_nfail': max_nfail, 'max_seqfail': max_seqfail, 'missed': missed, 'min_vdd': min_vdd}
    return hist
The point here is that the failure occurred in a line which didn't reference the dictionary, yet altering the dictionary's declaration changed the outcome.

Does the team think heap fragmentation is likely to be the cause? And how can I get the code to a state where small arbitrary changes don't result in a crash? Experience of numerous crashes tells me that the program is working by a very small margin :(

Re: Heap fragmentation

Posted: Wed Dec 23, 2015 7:32 am
by dhylands
I think that heap fragmentation is a real issue.

Its quite possible that's what your running into. I'd like to see a method added to gc which returns the size of the largest allocatable block. If that is less than the amount of memory that you failed to allocate, then you can be pretty sure that heap fragmentation is the issue.

There are several ways that people deal with this in embedded C:
1 - Do all of the allocations early on and never allocate once the program is running. This is obviously not feasible for a language like python.
2 - Use pools of similarly sized objects. So you would analyze the sizes of objects being allocated and you would create pools which are arrays of those sized objects. So you might have a pool which 100 2-block entries, 25 3-block entries, 75 4-block entries, etc.

When you allocate and free from the pools, you don't run into fragmentation issues. This would require changes in the heap to support this.

What causes fragmentation is when you follow a sequence something like:
- allocate some object
- allocate a 1-block object
- allocate a 2-block object
- free the 1-block object

This then leaves a 1-block hole between the other 2 objects and that can only be used for allocating single block objects.

Ideally you'd allocate permanent objects from one end of the heap and temporary objects from the other end of the heap.

Long running programs will typically reach a steady-state heap usage, and then have some addiitional requirement beyond that. Unfortunately the steady state depends entirely on the allocation pattern of the program. If a long running program fails to reach a steady state it generally means that it has a leak, which shouldn't be a problem with a gc. It could also fail to reach a steady state if the heap is smaller than the steady state.

Re: Heap fragmentation

Posted: Wed Dec 23, 2015 9:05 am
by pythoncoder
Thanks for that. I've put some diagnostic information here http://hinch.me.uk/dump.txt if that gives any clues

It's frustrating as this application is nearly complete but it's become a nightmare to code as the smallest change provokes this issue. I'll try simplifying some data structures - such as replacing dictionaries with pre-allocated global arrays - but the whole process has become too arbitrary for my taste :?

The code is simple. No interrupts, runs from start to end then goes into standby, 220 LOC. It does pull in a lot of libraries though.

Re: Heap fragmentation

Posted: Wed Dec 23, 2015 3:43 pm
by dhylands
That's getting to be a pretty full heap.

Here's how to interpret all of those letters:
. = free block
h = head block
t = tail block
m = marked head block
T = tuple
L = list
D = dict
F = float
B = byte code
M = module

Each letter represents a single block of memory, and each block is 16 bytes. So each line of the heap dump represents 0x400 bytes or 1k of memory.

An allocated object in the heap consists of a head block followed by some number (possibly zero) tail blocks. In the list above, everything except for tail, and free are head blocks.

Looking at your heap, I see some rather large blocks near the end of the heap (I'm specifically looking around 16940-end).

It also looks like your heap is filled with mostly data (as opposed to code), although its possible that there might be code which isn't being classified (like native?)

This line (just before the heap dump): 1=799 2=96 m=363 says that there are 799 single block objects, 96 2-block objects and that the largest block is 363 blocks (or 5808 bytes)

Your memory failure is happening while trying to allocate a 4916 byte (or 308 block) object, which needs almost 5 lines of free blocks. Looking at your heap, the largest free block is 307 blocks (4 lines of 64 + 51 = 307).

Re: Heap fragmentation

Posted: Wed Dec 23, 2015 5:24 pm
by pythoncoder
Interesting, thanks for that. In terms of my practical problem, simplifying data structures has achieved nothing. I plan to split the program in two. When the RTC alarm fires one does the comms and calculations, saves data for display and reboots. On reboot the other does a dumb display of the data. That reduces the amount of code imported by each program. Clunky but (hopefully) functional.

This is only feasible because of the way this application works. In a more typical embedded system with a do-forever loop I think the project would probably be a lost cause. It would be interesting to know if anyone else has hit this limitation.

When I last worked on embedded systems dynamic storage allocation was disallowed by the company largely owing to its non deterministic aspects.

Re: Heap fragmentation

Posted: Wed Dec 23, 2015 8:44 pm
by JimTal001
I went through the same issues a few weeks back with my application. Very frustrating with no predictability of where the program would crash next. It appears to me that garbage collection does not work well.

Re: Heap fragmentation

Posted: Wed Dec 23, 2015 9:00 pm
by dhylands
The garbage collection is working perfectly. Not sure why you think it isn't.

If you call gc.collect, then all of the objects left in the heap are objects which your code has references to through one of your data structures.

Re: Heap fragmentation

Posted: Wed Dec 23, 2015 10:13 pm
by Damien
The problem is that your program is close to needing the limit of available memory. It is made worse by the fact that the code is allocating very large data structures, which need lots of continuous GC blocks. But, don't worry, we can get the code working!

First, note that micropython.mem_info(1) will dump the heap info (no need to use pyb.info(1) if you just want heap info). There is also micropython.qstr_info(), and micropython.qstr_info(1) which tell you about memory used for interned strings.

Easiest thing to do to save RAM is to put your qstrs in ROM. Do this by getting your code to a point where it can all be imported and compiled (doesn't need to run). Then execute micropython.qstr_info(1) and cut and paste all the Q(xxx) lines. Put them into stmhal/qstrdefsport.h (at the bottom will be fine). Then compile stmhal/ with these added qstrs and download the firmware to the board. Now you will have quite a bit more free RAM (test it by running micropython.qstr_info(1) again; the Q's should all be gone).

If that's not enough then the next step is to put some modules in ROM using persistent bytecode (but I need to push some code for that to work).

Re: Heap fragmentation

Posted: Sat Dec 26, 2015 6:25 am
by pythoncoder
Thanks for that, Damien - bookmarked for future reference. Splitting my code into two programs with a hard_reset() works and has the merit of running with standard firmware so I'll stick with that.

I'm deeply puzzled by the large data structures as I'm only aware of one with global scope, a bytearray of just over 6K. I evidently have something to learn about writing Python for low heap usage and will investigate. The heap browser proposed by @dhylands would be handy here ;)

Re: Heap fragmentation

Posted: Sun Jan 31, 2016 12:17 pm
by pythoncoder
@Damien Curiosity got the better of me and I've followed this up. The approach of incorporating the qstrings into the build gained me about 10K of contiguous RAM at the end of the heap which is substantial. I had to edit the qstrings before the build could proceed as some contained unprintable characters and/or illegal syntax. That aside, it worked first time ;)

It surprised me to observe that most of the memory use occurs on importing the modules. Subsequently instantiating the objects used pretty much the amount of RAM I expected - buffers explicitly declared in code. This makes estimating the likely RAM use of a program hard to estimate (for me, anyway). Do you have any general pointers on this, and on the impact of persistent bytecode?