uPython Memory Manager

cloudformdesign · Post by **cloudformdesign** » Wed Mar 11, 2015 8:09 pm

Damien's development wishlist http://forum.micropython.org/viewtopic.php?f=3&t=455 has requested "Eliminate heap-memory fragmentation." To help achieve this goal I would like to start a thread with how the current memory manager works and get documentation on it as we move forward designing a new one.

I have created a Wiki Page for this here:
https://github.com/micropython/micropyt ... ry-Manager

I am currently working on a C memory manager that will be usable for general applications that I am calling tinymem. I want to tailor it to be usable for all of uPython's needs. First I need to understand how uPython's implementations work, and I think that documenting uPython's approach would be a helpful learning experience.

cloudformdesign · Post by **cloudformdesign** » Sun Mar 15, 2015 4:03 am

Ok, I am making some good progress, it is not yet usable. Check out my goals and overview for how I am building this here. I think I understand everything I need to to implement a fast memory manager with dedragmentation! Very exciting.

https://github.com/cloudformdesign/tinymem

pfalcon · Post by **pfalcon** » Sun Mar 15, 2015 11:56 pm

So, memory managers is surely my interest and I had a quick look. I have to say that current README is not exactly clear and easy to understand. I also have to say that writing good memory manager is hard task. Then proving that it's good is even harder.

So, let's look at what you have so far (and sorry if I misunderstood something per above).

Consolodates all freed memory together without movement

constructs a sorted list of all freed values sorted by location
this uses tm_tempalloc to get data from the end of the heap that will be freed before returning to userspace

You seem to assume that when the heap is fragmented you can allocate something to defragment it. Surely, that doesn't work. Or you assume that besides the controlled memory, there's some other memory which you can get when you want. The only way that can be true is if you set up needed memory beforehand, cutting that chunk from the total memory you have. For example, you say that you have heap 64K and can allocate max 8K objects in it? Well, then there can be 8K-1 free chunks between those allocated objects (maybe more), and you need 16K of memory to sort them.

There are 12 bins. The bins are the following byte sizes:

1, 2, 3,

Basic assumption of almost every memory manager is allocation granularity. It doesn't make sense to allocate 1 byte of memory - there will be more overhead to manage it. Just imagine that if you say allocation goes in multiple of 4, you can convert your lowy restriction of 16K max allocation into max of allocation of heap size, which does work in real world, unlike 16K.

There're also possible thinkos in the code. For example, what do you think following code does:

Code: Select all

#define TM_MAX_FILLED_PTRS     (TM_MAX_POOL_PTRS / 8 + TM_MAX_POOL_PTRS % 8)

Summing up: writing good memory manager is hard. No, very hard. The basics of defragmenting memory manager is double pointer indirection + semi-space. Yes, that means giving away more than half of useful memory. If you're interested in defragmenting memory manager for uPy, I suggest you implement that, then try to improve on it. Improving on that will be hard, you immediately will lose on performance (and double-indirection and semispace copying are already a big performance hit).

Post by **Damien** » Mon Mar 16, 2015 12:48 am

I agree it is hard. But that doesn't mean it shouldn't be attempted!

We have a good test for any new memory manager: the uPy test suite. If your memory manger passes all tests on all platforms then it is doing very well.

cloudformdesign · Post by **cloudformdesign** » Mon Mar 16, 2015 3:09 am

Hey pfalcon, thanks for taking a look at the code and giving all that feedback.

Most of your comments come from (I think) a misunderstanding of the basics of how tinymem works, which is clearly my fault -- if someone like you couldn't understand it then it needs to be better explained. To help facilitate understanding I have written this google doc https://docs.google.com/document/d/1qxT ... D8l2JyfNE/. I'm pretty sure this is not an impossible problem by using pointer indirection. I will be using this document for development discussion and using the README to give an overview of how the module works.

To answer your questions
I think my README jumped into the gritty details a bit to fast. I've already put a lot of thought into how to optimize tinymem, and the README document currently reflects those thoughts. The document I linked and the future README will be better.

For instance, I am not "allocating to defragment". I have a freed array that store which indexes are free. Deallocation does two steps:
- combines freed indexes that are right next to eachother
- moves data that has a higher location value into empty data (note that without freed bins, this would be VERY slow but still possible)

Ignore the bins of freed for now, they are not necessary and are only used to speed up various aspects of the memory manager. In the simplest (and very very slow) case, allocation would be as follows:
- look through a linear "freed" array for data of lesser or equal size
- if none exists take off the heap.

defragmenting would then clean up any left over holes at a later time.

- "#define TM_MAX_FILLED_PTRS (TM_MAX_POOL_PTRS / 8 + TM_MAX_POOL_PTRS % 8)" was actually a bug, haha. It is supposed to be
(TM_MAX_POOL_PTRS / 8 + (TM_MAX_POOL_PTRS % 8 ? 1 : 0)). It is currently being fixed (and tested for the first time) in the malloc branch.

Summing up: writing good memory manager is hard. No, very hard. The basics of defragmenting memory manager is double pointer indirection + semi-space. Yes, that means giving away more than half of useful memory. If you're interested in defragmenting memory manager for uPy, I suggest you implement that, then try to improve on it. Improving on that will be hard, you immediately will lose on performance (and double-indirection and semispace copying are already a big performance hit).

I would like to convince you that you only need single pointer indirection and you don't have to give away half of the useful memory. I will be fixing up the documentation, and I would love to chat over google hangouts to explain the inner workings if you would like.

cloudformdesign · Post by **cloudformdesign** » Mon Mar 16, 2015 1:22 pm

Also, tinymem ONLY implements a malloc /free architecture that takes care of fragmentation. It does NOT do garbage collection or reference counting. This is to keep it as simple and general as possible.

How ever, you should be able to easily adapt your garbage collector to use it (as it is a simple malloc/free interface)

pfalcon · Post by **pfalcon** » Mon Mar 16, 2015 11:24 pm

I would like to convince you that you only need single pointer indirection and you don't have to give away half of the useful memory.

Surely that's possible - at the expense of performance (maybe even not, but that's got to be something really smart, and possibly putting more arbitrary restrictions). And the best way to convince is to provide working and tested implementation

.

pfalcon · Post by **pfalcon** » Mon Mar 16, 2015 11:34 pm

How ever, you should be able to easily adapt your garbage collector to use it (as it is a simple malloc/free interface)

No, that's not how uPy works - it doesn't use "simple malloc/free interface" for its memory allocation. It maintains own memory arena with bitmap tracking each block's state. That arena is of course allocated from system using malloc(), but that's as far as external malloc usage goes. uPy of course can "emulate" malloc using its own arena for internal needs.

Making uPy GC pluggable, so it can work with system heap (or any external memory manager) is a worthy aim to follow. Note that to support tracing GC, memory manager should provide to things: ability to enumerate all allocations and ability to store a mark bit. Of course, these can be handled outside memory manager, by storing that information "on user level" in addition to garbage-collectable objects themselves.

cloudformdesign · Post by **cloudformdesign** » Tue Mar 17, 2015 12:50 am

pfalcon wrote: Surely that's possible - at the expense of performance (maybe even not, but that's got to be something really smart, and possibly putting more arbitrary restrictions). And the best way to convince is to provide working and tested implementation .

Agreed, that's what I'm currently trying to do! Please take a look at the google doc I linked, I would really appreciate your feedback and comments. That document is currently my design guide.

If you give me your email I can give you permission to edit and comment on the document.

cloudformdesign · Post by **cloudformdesign** » Tue Mar 17, 2015 9:44 am

I just thought of a major problem with my current "binning" scheme. The way I was doing it was "only move data left into bins". Each bin had pointers to freed memory locations.

My algorithm would move data from a higher memory address into a bin that was guaranteed to fit it. For instance, it would move 156 byte value into a 256 (or greater) byte freed array. Also it only moved memory left.

This has a pretty clear problem -- it will always cause data near the beginning of the array to get highly fragmented! Ooops! (unless people were making a lot of requests for 1-3byte values... which would be bad programming on their part)

I have already thought of a different solution.

Imagine that all freed values are kept in a hash table, let's say 30 element hash table for now. Each element is an array that can grow (because there will be lots of overlap). The arrays are sorted by location so you will always look at the first location first.

Now when the user requests data or the defragmenter wants to move it, a hash lookup will give you reasonably fast way to get data of that exact size and at the earliest location. All good so far.

But I think there is more to this story. There might be a way to take advantage of packing multiple pointers into one empty one. You would have to have some way of knowing that after the operation you are reducing the total number of freed-arrays. If you can know that, you would know that you are reducing fragmentation.

I'm still thinking about this... but I think it might work. Regardless, a hash table will actually solve the problem for "most cases" -- and the "full defragmentation" method I outlined will solve it for all cases every time (but be very slow). I've given some thought about both implementations, and I think for both of them you can have a "rolling defragment" where you can defragment for a little while and then let go of the CPU to let a few other threads run before you continue again.

MicroPython Forum (Archive)

uPython Memory Manager

uPython Memory Manager

Re: uPython Memory Manager

Re: uPython Memory Manager

Re: uPython Memory Manager

Re: uPython Memory Manager

Re: uPython Memory Manager

Re: uPython Memory Manager

Re: uPython Memory Manager

Re: uPython Memory Manager