Page 1 of 1

[SOLVED] Strange Micropython Code Execution (ESP32) due to old xtensa compiler

Posted: Fri Dec 17, 2021 5:22 pm
by johannes.rost
Dear community,

since a certain stage of commits in our development, we are facing strange errors: at seemingly random times, some of the try/except sections catch something like this:

First type of error:

Code: Select all

Reason: unsupported types for __lt__: 'NoneType', 'int'
Reason: unsupported types for __truediv__: 'float', 'function'
Reason: unsupported types for __lt__: 'NoneType', 'int'
Reason: unsupported type for __hash__: 'dict'
even if the regarding lines of code are executed fine otherwise.

Second type of error:

or in the data JSON in a SD card, this is logged:

Code: Select all

"sensor value": <function "linear" at 0x3fa0ae50>
before "sensor value" is written to SD card, there is some dataprocessing happening in this style:

Code: Select all

sensorvalue['sensor value'] = linear('sensor name', sensorvalue, Y)
(note that the dictionary is also used as a parameter to the function, may that be a good architecture decision or not ;) )

So, instead of setting the value to the result of the function, the function object itself is written to the position in the dictionary, and the string serialization transforms that into the <...> representation when the dictionary is json.dumps-ed to the SD card. But that just seems another form of the errors of the first kind.

The errors seem only to appear if there is additional C-code present (beginning with a certain commit). It does not matter if the C-code is acutally used; the errors also appear if none of the C-functions are accessed. But they seem to occur more often, if the statically defined data uint8_t - buffers are larger or the C-code is actually used (creates two additional freeRTOS tasks with around 6k of stack each).

So the current hypothesis is: the C-code generates additional flash and RAM usage for the statically allocated buffers and other variables. That may not leave enough stack for the freeRTOS Python-task on the ESP32 (with external PSRAM) to run uninterruptedly.

This is what the compiler puts out as statistics after compilation:

Code: Select all

xtensa-esp32-elf-size build/WIPY/release/application.elf
   text    data     bss     dec     hex filename
1016180  628056   47060 1691296  19cea0 build/WIPY/release/application.elf
we are using Pycom 1.20.2.r4 and 1.20.2.r6, error happens in both versions. We already stripped off Bluetooth an CAN modules.

Is the hypothesis correct?
Did anyone here experience such a behaviour?
If that is linked to the size of the C-code and variables, can we, already at compile time, check violation of limits before running that code on the device?

Thanks in advance for help!

PS: also posted here:
https://forum.pycom.io/topic/7408/stran ... sed-c-code

Re: Strange Micropython Code Execution With Custom (But Unused) C-Code

Posted: Sat Dec 18, 2021 1:00 pm
by pythoncoder
It might be worth raising this in the PyCom forum.

Re: Strange Micropython Code Execution With Custom (But Unused) C-Code

Posted: Tue Dec 21, 2021 9:32 am
by johannes.rost
Hi pythoncoder,

thanks for your fast reply! So do you think, that this might be more of a Pycom/ESP32 specific problem, than something that is related to Micropython itself?

Re: Strange Micropython Code Execution With Custom (But Unused) C-Code

Posted: Fri Jan 07, 2022 12:21 pm
by johannes.rost
Some new insights:
  • Those erros can also occur after a freshly cloned pycom 1.20.2.r6 version
    • without Bluetooth and CAN
    • without any custom C code
    • with frozen Python code
    but we also have compiled versions with 1.20.2.r6 that work without those errors.
  • It does not seem to matter if Python code to be frozen is located in "Custom" or a different folder (added in mkrules.mk)
I am investigating further, but it seems that in other versions that worked the Python code was frozes by an mpy-cross which was compiled with GCC 11.2.0. The ones that seem to cause trouble were compiled with GCC 10.3.0. The binaries of both mpy-cross versions are different in some places. But the resulting firmware binaries are the same. So maybe the tests were not long enough to observe the erroneoues behaviour.

Re: Strange Micropython Code Execution With Custom (But Unused) C-Code

Posted: Tue Jan 11, 2022 7:42 am
by johannes.rost
OK now I am lost and hopefully someone can put me in the right direction or has a hint about this.

I compiled the untouched pycom 1.20.2.r6 with custom Python code under esp32/frozen/Custom. No submodules involved there.

The version I compiled (Ubuntu Mate VM 21.10, GCC 11.2.0) shows the strange errors above on a device, that with a different version before did not produce these errors. So a device failure is ruled out.
The version another person compiled on their machine (Ubuntu Mate VM 21.10, GCC 11.2.0) does NOT schow the strange errors. BUT: using a mpy-cross compiled on my machine, the other person was also able to compile a binary with which the devices produced such errors. We are now about to compare compiled binaries between the two VMs.

If the frozen bytecode sometimes is executed wrong, or memory is confused somehow, why could it seemingly be dependent on the machine the code is compiled on?
Why does something like this happen at all?

Re: Strange Micropython Code Execution With Custom (But Unused) C-Code

Posted: Wed Jan 12, 2022 2:16 pm
by johannes.rost
Update:
There was a misunderstinge between the other person and me. So the binary that did produce the strange execution behaviour was actually compiled on a (third) different machine, which had the xtensa compiler version 1.22.0-80 installed. The versions that worked fine were built with 1.22.0-98, and on my machine (which produced binaries that show strange behaviour) again with 1.22.0-80.

Current summary:

VM1: always shows errors
GCC: 10.3.0 and 11.2.0
xtensa: 1.22.0-80

VM2: never showed errors
GCC: 11.2.0
xtensa: 1.22.0-98

VM3: always shows errors
GCC: 11.2.0
xtensa: 1.22.0-80

Tests still pending to see if that is provable by using xtensa 1.22.0-98 on VM1

Re: Strange Micropython Code Execution With Custom (But Unused) C-Code

Posted: Wed Jan 12, 2022 5:05 pm
by Roberthh
Yor are talking about the Pycom branch of Micropython. You may get better support at the Pycom forum. Since I worked a little bit with Oycom devices and the firmware I could make a blind guess. The pycom firmware alone alone fills it's partitions. So adding code may cause your problem.

Re: Strange Micropython Code Execution With Custom (But Unused) C-Code

Posted: Fri Jan 14, 2022 11:31 am
by johannes.rost
Thank you very much Roberthh for your reply! That could indeed have been a problem, there is not too much space left for custom code. Fortunately, I checked that already before. Also, it appears in versions with binary size around 1.6M and around 1.9M.
Most interestingly, it is always like that: the device runs for some time (in the realm of several hours), and after that, the described errors repeatedly appear in shorter cycles than the time span til the first appearance of the error.
I already posted at Pycom's forum, but since it still could have been possible that this has something to do with mpy-cross or the Micropython VM environment in general, I kept this thread updated.

I think we were able to track the origin of the error down, at least in this specific environment (ESP32, Pycom 1.20.2.r6, custom Python code, also combination of custom Python and C code):

All binaries compilled with xtensa 1.22.0-80 show this strange behaviour.
All binaries compiled with xtensa 1.22.0-98 DO NOT show this strange behaviour (same device, same VM, other devices, different VM)

So for now, this seems to be solved by simply use a more recent compiler.