Page 1 of 1

Native machine code bss corruption

Posted: Fri Oct 30, 2020 12:17 pm
by agners
Using current master, I built a Python module with native code for a ESP32 board. The module loads fine, and seems to work at first. however, after a while the calculations made in native code seemed off. I failed reproducing the issue on x64. It happened on two independent boards (GENERIC and TINYPICO).

I then noticed that when I force the garbage collector between calls to my module, it started to produce the expected results. The algorithm uses a rather large struct to store its state (which is in bss).

I then tried to reproduce this using the examples, and corruptions/crashes do happen from time to time when playing with the features1 example:

Code: Select all

import features1

while True:
    # Waste heap
    open("test.txt", "w+")
    # Access native python module
    test = features1.access()
    print(gc.mem_alloc())
I have to start that 2 or 3 times for things to get weird. It doesn't seem to corrupt the features1 heap (at least the value mostly look good), but it crashes from time to time and FAT file system corrupts as well.

Are there known issues with ESP32/native modules?

Best regards,
Stefan

Re: Native machine code bss corruption

Posted: Fri Oct 30, 2020 3:12 pm
by agners
I can reproduce the issue with the official ESP32 1.13 build from the website. Starting the following example once or twice leads to the corruption (increase BSS data in features1.c by using data16[32]):

Code: Select all

import features1

for i in range(2000):
    list = []
    for i in range(128):
            list.append(42)
    features1.access()
    gc.collect()
E.g.

Code: Select all

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Traceback (most recent call last):
  File "<stdin>", line 8, in <module>
KeyboardInterrupt:
>>>
paste mode; Ctrl-C to cancel, Ctrl-D to finish
=== import features1
===
=== for i in range(2000):
===     list = []
===     for i in range(128):
===             list.append(42)
===     features1.access()
===     gc.collect()
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0]
[85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0]
[85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0]
[85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0]
I do need to stop the execution using Ctrl+C once to make it fail, so it seems Exceptions need to be somehow involved maybe?

Re: Native machine code bss corruption

Posted: Mon Nov 02, 2020 11:53 pm
by jimmo
I can repro this on ESP32 (but not Unix or STM32). I will raise a bug on GitHub with more info.

https://github.com/micropython/micropython/issues/6592

Re: Native machine code bss corruption

Posted: Fri Nov 06, 2020 3:19 am
by jimmo
See Damien's reply on that issue. Doesn't look like there's currently a workaround sorry...