[SOLVED] Strange Micropython Code Execution (ESP32) due to old xtensa compiler

All ESP32 boards running MicroPython.
Target audience: MicroPython users with an ESP32 board.
Post Reply
johannes.rost
Posts: 12
Joined: Wed Jun 30, 2021 7:03 am

[SOLVED] Strange Micropython Code Execution (ESP32) due to old xtensa compiler

Post by johannes.rost » Fri Dec 17, 2021 5:22 pm

Dear community,

since a certain stage of commits in our development, we are facing strange errors: at seemingly random times, some of the try/except sections catch something like this:

First type of error:

Code: Select all

Reason: unsupported types for __lt__: 'NoneType', 'int'
Reason: unsupported types for __truediv__: 'float', 'function'
Reason: unsupported types for __lt__: 'NoneType', 'int'
Reason: unsupported type for __hash__: 'dict'
even if the regarding lines of code are executed fine otherwise.

Second type of error:

or in the data JSON in a SD card, this is logged:

Code: Select all

"sensor value": <function "linear" at 0x3fa0ae50>
before "sensor value" is written to SD card, there is some dataprocessing happening in this style:

Code: Select all

sensorvalue['sensor value'] = linear('sensor name', sensorvalue, Y)
(note that the dictionary is also used as a parameter to the function, may that be a good architecture decision or not ;) )

So, instead of setting the value to the result of the function, the function object itself is written to the position in the dictionary, and the string serialization transforms that into the <...> representation when the dictionary is json.dumps-ed to the SD card. But that just seems another form of the errors of the first kind.

The errors seem only to appear if there is additional C-code present (beginning with a certain commit). It does not matter if the C-code is acutally used; the errors also appear if none of the C-functions are accessed. But they seem to occur more often, if the statically defined data uint8_t - buffers are larger or the C-code is actually used (creates two additional freeRTOS tasks with around 6k of stack each).

So the current hypothesis is: the C-code generates additional flash and RAM usage for the statically allocated buffers and other variables. That may not leave enough stack for the freeRTOS Python-task on the ESP32 (with external PSRAM) to run uninterruptedly.

This is what the compiler puts out as statistics after compilation:

Code: Select all

xtensa-esp32-elf-size build/WIPY/release/application.elf
   text    data     bss     dec     hex filename
1016180  628056   47060 1691296  19cea0 build/WIPY/release/application.elf
we are using Pycom 1.20.2.r4 and 1.20.2.r6, error happens in both versions. We already stripped off Bluetooth an CAN modules.

Is the hypothesis correct?
Did anyone here experience such a behaviour?
If that is linked to the size of the C-code and variables, can we, already at compile time, check violation of limits before running that code on the device?

Thanks in advance for help!

PS: also posted here:
https://forum.pycom.io/topic/7408/stran ... sed-c-code
Last edited by johannes.rost on Fri Jan 14, 2022 11:32 am, edited 3 times in total.

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Strange Micropython Code Execution With Custom (But Unused) C-Code

Post by pythoncoder » Sat Dec 18, 2021 1:00 pm

It might be worth raising this in the PyCom forum.
Peter Hinch
Index to my micropython libraries.

johannes.rost
Posts: 12
Joined: Wed Jun 30, 2021 7:03 am

Re: Strange Micropython Code Execution With Custom (But Unused) C-Code

Post by johannes.rost » Tue Dec 21, 2021 9:32 am

Hi pythoncoder,

thanks for your fast reply! So do you think, that this might be more of a Pycom/ESP32 specific problem, than something that is related to Micropython itself?

johannes.rost
Posts: 12
Joined: Wed Jun 30, 2021 7:03 am

Re: Strange Micropython Code Execution With Custom (But Unused) C-Code

Post by johannes.rost » Fri Jan 07, 2022 12:21 pm

Some new insights:
  • Those erros can also occur after a freshly cloned pycom 1.20.2.r6 version
    • without Bluetooth and CAN
    • without any custom C code
    • with frozen Python code
    but we also have compiled versions with 1.20.2.r6 that work without those errors.
  • It does not seem to matter if Python code to be frozen is located in "Custom" or a different folder (added in mkrules.mk)
I am investigating further, but it seems that in other versions that worked the Python code was frozes by an mpy-cross which was compiled with GCC 11.2.0. The ones that seem to cause trouble were compiled with GCC 10.3.0. The binaries of both mpy-cross versions are different in some places. But the resulting firmware binaries are the same. So maybe the tests were not long enough to observe the erroneoues behaviour.

johannes.rost
Posts: 12
Joined: Wed Jun 30, 2021 7:03 am

Re: Strange Micropython Code Execution With Custom (But Unused) C-Code

Post by johannes.rost » Tue Jan 11, 2022 7:42 am

OK now I am lost and hopefully someone can put me in the right direction or has a hint about this.

I compiled the untouched pycom 1.20.2.r6 with custom Python code under esp32/frozen/Custom. No submodules involved there.

The version I compiled (Ubuntu Mate VM 21.10, GCC 11.2.0) shows the strange errors above on a device, that with a different version before did not produce these errors. So a device failure is ruled out.
The version another person compiled on their machine (Ubuntu Mate VM 21.10, GCC 11.2.0) does NOT schow the strange errors. BUT: using a mpy-cross compiled on my machine, the other person was also able to compile a binary with which the devices produced such errors. We are now about to compare compiled binaries between the two VMs.

If the frozen bytecode sometimes is executed wrong, or memory is confused somehow, why could it seemingly be dependent on the machine the code is compiled on?
Why does something like this happen at all?

johannes.rost
Posts: 12
Joined: Wed Jun 30, 2021 7:03 am

Re: Strange Micropython Code Execution With Custom (But Unused) C-Code

Post by johannes.rost » Wed Jan 12, 2022 2:16 pm

Update:
There was a misunderstinge between the other person and me. So the binary that did produce the strange execution behaviour was actually compiled on a (third) different machine, which had the xtensa compiler version 1.22.0-80 installed. The versions that worked fine were built with 1.22.0-98, and on my machine (which produced binaries that show strange behaviour) again with 1.22.0-80.

Current summary:

VM1: always shows errors
GCC: 10.3.0 and 11.2.0
xtensa: 1.22.0-80

VM2: never showed errors
GCC: 11.2.0
xtensa: 1.22.0-98

VM3: always shows errors
GCC: 11.2.0
xtensa: 1.22.0-80

Tests still pending to see if that is provable by using xtensa 1.22.0-98 on VM1

User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: Strange Micropython Code Execution With Custom (But Unused) C-Code

Post by Roberthh » Wed Jan 12, 2022 5:05 pm

Yor are talking about the Pycom branch of Micropython. You may get better support at the Pycom forum. Since I worked a little bit with Oycom devices and the firmware I could make a blind guess. The pycom firmware alone alone fills it's partitions. So adding code may cause your problem.

johannes.rost
Posts: 12
Joined: Wed Jun 30, 2021 7:03 am

Re: Strange Micropython Code Execution With Custom (But Unused) C-Code

Post by johannes.rost » Fri Jan 14, 2022 11:31 am

Thank you very much Roberthh for your reply! That could indeed have been a problem, there is not too much space left for custom code. Fortunately, I checked that already before. Also, it appears in versions with binary size around 1.6M and around 1.9M.
Most interestingly, it is always like that: the device runs for some time (in the realm of several hours), and after that, the described errors repeatedly appear in shorter cycles than the time span til the first appearance of the error.
I already posted at Pycom's forum, but since it still could have been possible that this has something to do with mpy-cross or the Micropython VM environment in general, I kept this thread updated.

I think we were able to track the origin of the error down, at least in this specific environment (ESP32, Pycom 1.20.2.r6, custom Python code, also combination of custom Python and C code):

All binaries compilled with xtensa 1.22.0-80 show this strange behaviour.
All binaries compiled with xtensa 1.22.0-98 DO NOT show this strange behaviour (same device, same VM, other devices, different VM)

So for now, this seems to be solved by simply use a more recent compiler.

Post Reply