Page 2 of 3

Re: Assert error during garbage collection

Posted: Fri Apr 17, 2020 8:06 am
by Roberthh
Is there a documentation somewhere about how the memory layout should be and how gc.collect() should be structured? I have indeed my doubts about the memory layout of the W600 port. Questions like:

- is is permitted for the stack o the main task to be allocated statically?
- is there a specific order in memory expected for heap and stack?

Re: Assert error during garbage collection

Posted: Fri Apr 17, 2020 10:44 am
by stijn
I'm afraid there's no documentation other then the code and comments therein. Or at least I haven't seen such documentation.

For your questions: in the end the only thing which matters for the memory allocator/garbage collector is that it know precisely where your memory is, i.e. the address and size (both for heap and stack). That's also how gc_init and gc_collect_root operate they just take pointers and sizes.

As such I don't think it matters at all where that memory comes from, whether it's statically allocated or not and where one is with respect to the other (well, as long as there's no overlap). Same goes for tracing root pointers: if you tell the garbage collector where they are correctly, it's ok, doesn't matter where that is. That last part was the problem in aforementioned bug: the gc_collect_regs_and_stack function assumed that everyting allocated was somewhere between current stack pointer to top of stack, but due to heavy optimizations some pointers fell outside of that region.

Note: I should stress that this is only how I think it is and I am not 100% sure.

Re: Assert error during garbage collection

Posted: Fri Apr 17, 2020 12:24 pm
by jimmo
Yep, that all sounds right to me. On baremetal ports, the heap is usually in a region defined by the linker (which is everything left over in RAM after bss, data and stack). On the Unix and RTOS ports, it's created with malloc or a static fixed-size buffer.

One thing about the REPL is that it can result in a reasonable amount of calls to gc_free and gc_realloc.

Like stjin said, the relative location of everything isn't important, just so long as they don't accidentally overlap.

Whenever I've worked on GC-related experiments, I've enabled debug printing (and usually added a lot more printfs) and in particular tracing every pointer returned by gc_alloc and passed to gc_realloc/gc_free, and then done a lot of manual work to find issues.

It does sound an awful lot like root pointer discovery could be the issue here... and there's so many ways it can go wrong (and it's highly port/arch specific so a likely candidate for a new port). Do you have a way to hook up GDB to the chip you're using?

Do you have a link to the github repo for the port you're working with?

Re: Assert error during garbage collection

Posted: Fri Apr 17, 2020 1:42 pm
by stijn
Just a note, but depending on which debug printing you use, it might itself allocate which makes things way harder to follow.
So yeah a debugger can really help here.

Re: Assert error during garbage collection

Posted: Fri Apr 17, 2020 1:49 pm
by Roberthh
Hi @jimmo. No, I cannot debug. I did some manual stack tracing (print stack and match it to the loader map), but i just assigned a few functions. The port I'm sticking my nose in is the w600 port of @wdyichen (winner micro). Nice low cost chip.

It is in the w60x branch of my micropython fork:
The issue with details is here:
It is important, that the script ( is called from REPL by cursor-up/enter repetition, thereby calling parse_compile_execute() again. Otherwise it will not fail.

Edit: WM says, it requires gcc 4.9, but it works fine also with gcc 7.3.1 (the one I am using)
And it requires the SDK mentioned in the Besides that, you can just copy over the w60x file tree and add it as a port to your micropython files. No more interference, I hope.

Re: Assert error during garbage collection

Posted: Fri Apr 17, 2020 4:24 pm
by dhylands
Also check the initial stack pointer. By this, I'm referring to the value that gets assigned to the stack pointer at RESET time, On the Cortex-M3's (I think the w600 is an M3) this should be the very first word in flash.

In the CMSIS startup_stm32f3xxxxxx.s files this will normally be the symbol _estack.

I'd check that in the linker map file to see what value _estack was assigned. Make sure that the amount of RAM in your MCU matches what the linker control (the .ld file) says.

It's also worthwhile to check for "shadows". In many MCUs, the RAM is visible at the advertised address and then it may repeat at higher addresses. When starting with a new processor, I typically write a test which writes the address at each 4 byte location in memory and then goes back and re-reads the memory to verify that the address that's read back is expected. You need to be careful not to do this where your stack is. You can often get away with just doing this every 1K or 4K rather than every word as well.

Where shadows become problematic is when you think that there is say 64K of RAM, but there is actually only 32K followed by a 32K shadow. You put your stack at the end of the 64K and for a while everything is good, until your stack starts corrupting things down at the 32K mark because your data use has grown.

Re: Assert error during garbage collection

Posted: Fri Apr 17, 2020 6:41 pm
by Roberthh
Thank you Dave for your hints. I assume that these aspects are well covered by the RTOS, on which the port is based. In general, Micropython runs well. It is just this Assert error, which happens under certain conditions. I dumped the stack, when it happens, and assigned manually the addresses from the code space to functions. The memory fault happens always, after a garbage collection was fully executed during the parse_compile_execute() call in the friendly REPL loop. During garbage collection, the function stack in gc_collect is e.g.:

Code: Select all

Offset 5, 0x801090d         py/gc gc_alloc 0x8010858 0xb5
Offset 10, 0x808e7c4        lib/utils/pyexec 0x808e70c 0xb8 
Offset 13, 0x8010235        py/malloc m_malloc 0x801022a 0x0b
Offset 15, 0x80148a3        py/parse  mp_parse 0x8014884 0x1f
Offset 23, 0x808e7c4        lib/utils/pyexec 0x808e70c 0xb8
Offset 27, 0x8014035        py/lexer mp_lexer_new 0x8013fc8 0x6d
Offset 30, 0x8013669        py/reader mp_reader_mem_close 0x013668 0x01
Offset 31, 0x801367d        py/reader mp_reader_new_mem 0x801367c 0x01
Offset 36, 0x808e7cc        lib/utils/pyexec 0x808e70c 0xc0
Offset 39, 0x808e7c4        lib/utils/pyexec 0x808e70c 0xb8
I omitted the data values in the stack
When the assert happens in gc_free. it's

Code: Select all

Offset 2, 0x807d8fa         py/parse rule_arg_offset_table 0x807d7d8 0x122
Offset 9, 0x801029f         py.malloc.c m_free()  0x8010298 0x7 
Offset 11, 0x801138d        py/vstr vstr_clear()  0x0008011374 0x19 
Offset 13, 0x8014099        py/lexer  mp_lexer_free() 0x8014084 0x15 
Offset 15, 0x8014d2b        py/parse  mp_parse() 0x8014884 0x4a7
Offset 36, 0x808e7cc        lib/utils/pyexec 0x808e70c 0xc0 
Offset 39, 0x808e7c4        lib/utils/pyexec 0x808e70c 0xb8 
When it fails, it's always in mp_parse() during the call to mp_lexer_free() at the very end of mp_parse().
The entries at offset 36 and 39 may not belong to that bad behavior. Also, the stack usage in calls of gc_collect, when all runs well, is much larger than in case of fails. In the good case, the stack is about 300 words large, in the failing cases about 80.

Re: Assert error during garbage collection

Posted: Fri Apr 17, 2020 9:24 pm
by dhylands
Looking at your call stack, this is happening when parse calls gc_free explicitly (via mp_lexer_free).

So your assert could be happening because the lexer object was already freed by a call gc_collect (between the allocation of the lexer object and the free).

It looks like the lexer object lives is a local variable to the parse_compile_execute function: ... exec.c#L85

This means that the pointer will either live on the stack or in a register. This suggests that the lexer pointer might not be getting picked up by the gc_collect function. The gc_helper_get_regs_and_sp function is responsible for putting all of the registers into a buffer on the stack: ... lect.c#L48
and then the call to gc_collect_root: ... lect.c#L51
should be detecting the pointer to the lexer on the stack which should cause gc_collect to NOT release it. However if the pointer to the lexer object isn't found, then it will get freed, and the later call to mp_lexer_free will cause the assert that you're seeing.

If you've got a debugger available, then I would save a pointer to the mp_lexer object in a global and set the global to 0 when its freed. Then add an if statement inside gc_collect that can be used to trigger a breakpoint in the debugger when that global is non-null. You should then be able to look at the stack and verify that the region scanned by gc_colllect_root contains the pointer to the lexer object.

Scanning the wrong chunk of memory would trigger the assert you're seeing. You should also examine a dissassembly of the mp_parse function to see if the mp_lexer pointer is stored on the stack or only saved in a register. If it is saved on the stack then you'll want to confirm that the stack address is contained in the range determined by gc_collect_root.

You should also verify that sp is aligned to a 4 byte boundary, and the address of the lexer pointer (not the address of the lexer object but the address of the lexer pointer) should also be on a 4-byte boundary. gc_collect_root only looks at uint32_t values starting at the sp pointer, so if the pointer to the lexer object were straddling 2 of those uint32_t's then it wouldn't get picked up. Normally the RTOS would ensure that the thread stacks are aligned properly, but it wouldn't be the first time I've seen a mistake like that.

Re: Assert error during garbage collection

Posted: Sat Apr 18, 2020 7:24 am
by stijn
Just to doublecheck: your gccollect.c doesn't collect registers, only stack. Is that sufficient for this controller? And is this a build with or without MICROPY_PY_THREAD?

Re: Assert error during garbage collection

Posted: Sat Apr 18, 2020 8:18 am
by Roberthh
Hello Dave, hello @stijn, Thank you for your answers and attention. Let me answer @stijns question about gc_collect() first:
Yes, the port uses threading. gc_collect() has the respective #if's, and I am pretty sure that the non-threading branch is wrong.
About collecting registers: It is a ARM M3. I do nto know if that one requires collecting registers. The code for gc_collect() is pretty simple:

Code: Select all

void gc_collect(void) {
    // start the GC

    uintptr_t sp = get_sp();

    // trace the stack, including the registers (since they live on the stack in this function)
    gc_collect_root((void **)sp, ((uint32_t)MP_STATE_THREAD(stack_top) - sp) / sizeof(uint32_t));
    gc_collect_root((void **)sp, ((uint32_t)(mpy_task_stk + MPY_STACK_LEN) - sp) / sizeof(uint32_t));
    // trace root pointers from any threads
    // end the GC
I verified the parameters for the gc_collect_root calls ans the value of sp, and they are within the proper bounds. And even if I skip the calls to gc_collect_root() and mp_thread_gc_others(), the error happens. So a missing root pointer collection in registers could be a reason.
Dave, you hint sound right. Only, assembling that file produces an error, but assembling it within the same source tree for stm32 works. So there is some setting somewhere.

Edit: Fixed: cross compiler switch was not set. I wonder how it worked before with gcc.