PyBoard and Wipy stalls during gc.collect()

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
User avatar
danicampora
Posts: 342
Joined: Tue Sep 30, 2014 7:20 am
Contact:

Re: PyBoard and Wipy stalls during gc.collect()

Post by danicampora » Fri Nov 06, 2015 6:40 am

Wow Dave! You have some awesome debugging skills!! :-)

User avatar
dhylands
Posts: 3821
Joined: Mon Jan 06, 2014 6:08 pm
Location: Peachland, BC, Canada
Contact:

Re: PyBoard and Wipy stalls during gc.collect()

Post by dhylands » Fri Nov 06, 2015 7:50 am

ok - I figured out what the problem is.

Way back around the 5th or 6th file open, (the one that fails), it allocates an OSError object for the failure, and then allocates some traceback data here:
https://github.com/micropython/micropyt ... ept.c#L439

this is an allocation of 3 mp_uint_t's and NOT python objects (at memory location 20003bb0).

Eventually, gc_collect gets called, and it looks at the object at 20003bb0 and tries to decipher it as a python object. This is where things get messed up. It dereferences the 'type' field as if it were a pointer, but its actually just an mp_uint_t and then things get messed up when it dereferences type->attr, gets a non-NULL and tries to call the function.

So to fix this, I think that the heap either has to know its an object versus non-object, or we need to make sure that all allocations are objects. Perhaps we can tag non-object allocations by overallocating a word and stashing a special 'type' that means its raw memory.

Anyways - off to bed for me. I'll let Damien decide how he wants to fix this.

User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: PyBoard and Wipy stalls during gc.collect()

Post by Roberthh » Fri Nov 06, 2015 12:24 pm

Hello Dave, that is a very impressive analysis.
I never looked into the Micrpython sources, so I had not clue where to start searching. And even then one must have the right hunting skills, which you obviously have.
Robert
b.t.w: Thanks for the star.

User avatar
dhylands
Posts: 3821
Joined: Mon Jan 06, 2014 6:08 pm
Location: Peachland, BC, Canada
Contact:

Re: PyBoard and Wipy stalls during gc.collect()

Post by dhylands » Fri Nov 06, 2015 3:18 pm

I thought I would post a little bit more about some of the techniques I used.

For this particular problem, it was extremely helpful to have some test code that I could run over and over again and that it would crash in a similar fashion each time.

To get at disassembled listing of micropython, I used the following:

Code: Select all

cd build-PYBV10
arm-none-eabi-objdump --disassemble firmware.elf > firmware.dis
This produces a rather large file (about 3.8 Mb), but not too bad (I used to do this with linux kernels - those disassembly files are positively huge).

Along the way, I also wanted to know where gc_alloc and then gc_free was being called from (for the particular chunk of memory that gccollect was tripping over). If I had a debugger, this would have been fairly straight forward to get. Fortunately, gcc has a few helper functions that can give you that information. So I added something like this to gc_alloc and gc_free:

Code: Select all

    if (ret_ptr == (void*)(long)0x20003bb0) {
        printf("Allocated object @ %p\n", ret_ptr);
        printf("  Caller = %p\n", __builtin_return_address(0));
    }
Having those prints added to the code, I got:

Code: Select all

5                                        
6                                        
Allocated object @ 20003bb0              
  Caller = 802009b                      
Freeing object @ 20003bb0                
  Caller = 8045e7b                      
Allocated object @ 20003bb0              
  Caller = 802e88f                      
Another thing to be aware of about ARM code, is that the program counter is odd when executing Thumb code and even when executing ARM32 code. Since the M4 only has Thumb2 code, this means that the PC will always be odd. So you need to strip off the LSB when trying to find the address in the disassembly listing.

Armed with that I could see that the first allocation occured at 802009b which was inside file_open
https://github.com/micropython/micropyt ... ile.c#L192
and the first free was also in file_open:
https://github.com/micropython/micropyt ... ile.c#L198
The second allocation (which turns out to be the interesting one) of 20003bb0 was called from 802e88f which was here:
https://github.com/micropython/micropyt ... ept.c#L439

Obviously having some familiarity with the code helps, and having the registers from the HardFault really helped.

Also, since the code and data are in separate spaces, this is one of those cases where the problem was more reproducible on the pyboard. Adding debug code made small enough changes that the problem kept reproducing at the same place each time. With desktop executables, the act of adding debug code often changes the size of your code, which in turn changes the base address of your heap, which then changes all of the addresses of allocated objects.

User avatar
danicampora
Posts: 342
Joined: Tue Sep 30, 2014 7:20 am
Contact:

Re: PyBoard and Wipy stalls during gc.collect()

Post by danicampora » Fri Nov 06, 2015 3:23 pm

Thanks Dave for sharing your knowledge! Very nice :-)

blmorris
Posts: 348
Joined: Fri May 02, 2014 3:43 pm
Location: Massachusetts, USA

Re: PyBoard and Wipy stalls during gc.collect()

Post by blmorris » Fri Nov 06, 2015 4:56 pm

I second that thank you! A few new tricks to add to the tool belt!
-Bryan

pfalcon
Posts: 1155
Joined: Fri Feb 28, 2014 2:05 pm

Re: PyBoard and Wipy stalls during gc.collect()

Post by pfalcon » Fri Nov 06, 2015 8:16 pm

dhylands, Great debugging work indeed.
dhylands wrote: So to fix this, I think that the heap either has to know its an object versus non-object, or we need to make sure that all allocations are objects.
Except we of course do this. More specifically, there's a bitmap which tracks which objects have finalizer. You can see this at https://github.com/micropython/micropyt ... /gc.c#L235 . So, further thinking should go along the line of: is finalizer bit randomly set somehow? is it not reset somehow (see https://github.com/micropython/micropyt ... /gc.c#L247)? is finalizer bitmap gets corrupted?
dhylands wrote:
Perhaps we can tag non-object allocations by overallocating a word and stashing a special 'type' that means its raw memory.
This is called "precise garbage collection" and is indeed very expensive technique.
Awesome MicroPython list
Pycopy - A better MicroPython https://github.com/pfalcon/micropython
MicroPython standard library for all ports and forks - https://github.com/pfalcon/micropython-lib
More up to date docs - http://pycopy.readthedocs.io/

User avatar
dhylands
Posts: 3821
Joined: Mon Jan 06, 2014 6:08 pm
Location: Peachland, BC, Canada
Contact:

Re: PyBoard and Wipy stalls during gc.collect()

Post by dhylands » Fri Nov 06, 2015 8:23 pm

@pfalcon - thanks for the comments.

I opened issue https://github.com/micropython/micropython/issues/1594

I had some recollection of there being a way to track whether objects had finalizers or not (but I couldn't remember any details), so I'll keep poking at this since I still have all of my debug code and I also have a nice reproducible failure (the best kind of failure :).

User avatar
dhylands
Posts: 3821
Joined: Mon Jan 06, 2014 6:08 pm
Location: Peachland, BC, Canada
Contact:

Re: PyBoard and Wipy stalls during gc.collect()

Post by dhylands » Fri Nov 06, 2015 10:37 pm

And there is a PR now: https://github.com/micropython/micropython/pull/1595

With that PR applied, the failing test file runs to the end on my pyboard.

User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: PyBoard and Wipy stalls during gc.collect()

Post by Roberthh » Sat Nov 07, 2015 2:44 pm

Hello Dave, I tried the fix on WiPy and PyBoard with the test case, and it works! That is an very impressive analysis & fix you've done. And I see you also added some outcomes of that task to stmhal, in case of hard errors.
Best regards, Robert

Post Reply