Native machine code bss corruption

C programming, build, interpreter/VM.
Target audience: MicroPython Developers.
Post Reply
agners
Posts: 6
Joined: Thu Jul 25, 2019 9:48 pm

Native machine code bss corruption

Post by agners » Fri Oct 30, 2020 12:17 pm

Using current master, I built a Python module with native code for a ESP32 board. The module loads fine, and seems to work at first. however, after a while the calculations made in native code seemed off. I failed reproducing the issue on x64. It happened on two independent boards (GENERIC and TINYPICO).

I then noticed that when I force the garbage collector between calls to my module, it started to produce the expected results. The algorithm uses a rather large struct to store its state (which is in bss).

I then tried to reproduce this using the examples, and corruptions/crashes do happen from time to time when playing with the features1 example:

Code: Select all

import features1

while True:
    # Waste heap
    open("test.txt", "w+")
    # Access native python module
    test = features1.access()
    print(gc.mem_alloc())
I have to start that 2 or 3 times for things to get weird. It doesn't seem to corrupt the features1 heap (at least the value mostly look good), but it crashes from time to time and FAT file system corrupts as well.

Are there known issues with ESP32/native modules?

Best regards,
Stefan

agners
Posts: 6
Joined: Thu Jul 25, 2019 9:48 pm

Re: Native machine code bss corruption

Post by agners » Fri Oct 30, 2020 3:12 pm

I can reproduce the issue with the official ESP32 1.13 build from the website. Starting the following example once or twice leads to the corruption (increase BSS data in features1.c by using data16[32]):

Code: Select all

import features1

for i in range(2000):
    list = []
    for i in range(128):
            list.append(42)
    features1.access()
    gc.collect()
E.g.

Code: Select all

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Traceback (most recent call last):
  File "<stdin>", line 8, in <module>
KeyboardInterrupt:
>>>
paste mode; Ctrl-C to cancel, Ctrl-D to finish
=== import features1
===
=== for i in range(2000):
===     list = []
===     for i in range(128):
===             list.append(42)
===     features1.access()
===     gc.collect()
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0]
[85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0]
[85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0]
[85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0, 85, 0]
I do need to stop the execution using Ctrl+C once to make it fail, so it seems Exceptions need to be somehow involved maybe?

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: Native machine code bss corruption

Post by jimmo » Mon Nov 02, 2020 11:53 pm

I can repro this on ESP32 (but not Unix or STM32). I will raise a bug on GitHub with more info.

https://github.com/micropython/micropython/issues/6592

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: Native machine code bss corruption

Post by jimmo » Fri Nov 06, 2020 3:19 am

See Damien's reply on that issue. Doesn't look like there's currently a workaround sorry...

Post Reply