Compiled modules load time

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
Post Reply
agonnen
Posts: 27
Joined: Sat Oct 13, 2018 7:52 pm

Compiled modules load time

Post by agonnen » Sun Feb 07, 2021 11:23 pm

Hi,

I'm compiling modules for ESP32 with:

Code: Select all

mpy-cross -march=xtensawin
The compiled mpy modules are imported, loaded and start running after boot.

I expected these modules to be loaded very fast when imported since they are already compiled. However, it takes several seconds after boot until the modules are imported, loaded and start running.
On the other hand, when freezing the same modules I don't see this delay. The modules are imported instantly and start running immediately after boot.

In both cases, compiled-modules and frozen-modules, the modules are already compiled and stored on Flash.
So why is the importing of compiled mpy modules so much slower compared to frozen modules?
Is all of it the vfs overhead of reading mpy files into RAM, or am I missing something in the flow?

Compiled mpy modules have many advantages over frozen modules, but their long load time is a big disadvantage.

Thanks,
Amir

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: Compiled modules load time

Post by jimmo » Mon Feb 08, 2021 6:36 am

agonnen wrote:
Sun Feb 07, 2021 11:23 pm
So why is the importing of compiled mpy modules so much slower compared to frozen modules?
Is all of it the vfs overhead of reading mpy files into RAM, or am I missing something in the flow?
Frozen modules are executed directly from flash. The flash is memory mapped and the VM executes the bytecode from "memory".

.mpy files unfortunately have to be first copied into RAM, as there is no way to memory-map the filesystem. (This is probably _technically_ possible, but the implementation would be quite difficult without an MMU / virtual memory).

So compiled files technically run a bit faster but load slower, whereas frozen files pay for occasional cache misses at runtime.

User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: Compiled modules load time

Post by Roberthh » Mon Feb 08, 2021 7:28 am

Your compiled modules must be very large if loading from the file system takes several seconds. Are you sure that there is not still the .py version of the module in the files system. That will take preference.
Side note: on a SPIRAM device there will also be cache misses for modules loaded to (SPI-)RAM.

agonnen
Posts: 27
Joined: Sat Oct 13, 2018 7:52 pm

Re: Compiled modules load time

Post by agonnen » Mon Feb 08, 2021 10:55 pm

Roberthh wrote:
Mon Feb 08, 2021 7:28 am
Your compiled modules must be very large if loading from the file system takes several seconds. Are you sure that there is not still the .py version of the module in the files system. That will take preference.
No, there is something else going on here.
There are 13 .mpy files, total of 50KB. Not large at all.
There are no .py files on the device file system.

But still, when trying to import them everything freezes for ~5 seconds.
Copying this 50KB from VFS to RAM can't be that slow.
It looks like either some heavy processing is being performed on the data of the mpy files, or some explicit sleep is taking place.
In persistentcode.c it seems that the mpy files are read byte-by-byte. Maybe that's done without buffering in an inefficient way?

I'm using the latest Micropython v1.14. Has anyone else experienced this problem?

kevinkk525
Posts: 969
Joined: Sat Feb 03, 2018 7:02 pm

Re: Compiled modules load time

Post by kevinkk525 » Tue Feb 09, 2021 6:02 am

Does your module use psram? My esp32 with psram is painfully slow when it comes to loading modules. But 5 seconds does seem a little long but might not be unusal. My whole project consists of .mpy files and on start it loads up to 7 files and it takes a while..
Kevin Köck
Micropython Smarthome Firmware (with Home-Assistant integration): https://github.com/kevinkk525/pysmartnode

User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: Compiled modules load time

Post by Roberthh » Tue Feb 09, 2021 6:57 am

File accesses are buffered. So even if the stream is read byte-by-byte, there is a buffer in the way. But caching may get a problem, if the code and memory areas touch in the way are larger than the cache size. As far as I recall, a cache refresh takes 100-300µs. If that happens for every byte of the 50k, that would be 5-15 seconds. Even if that happens only for every 8 byte block, it would be ~1-2 seconds. This load & store is not needed if the code is frozen.

agonnen
Posts: 27
Joined: Sat Oct 13, 2018 7:52 pm

Re: Compiled modules load time

Post by agonnen » Tue Feb 09, 2021 10:39 pm

Time to connect JTAG!

Random breaks while importing mpy modules always end up at qstr_find_strn

Code: Select all

(gdb) bt
#0  0x400d3734 in qstr_find_strn (str=0x3f819960 "set_value", str_len=9) at ../../py/qstr.c:181
#1  0x400d377f in qstr_from_strn (str=0x3f819960 "set_value", len=9) at ../../py/qstr.c:197
#2  0x400df4f9 in load_qstr (reader=0x3ffd4be0, qw=0x3ffd4b70) at ../../py/persistentcode.c:279
#3  0x400df5e8 in load_bytecode (ip_top=0x3f81995f "", ip=0x3f819900 "", qw=0x3ffd4b70, reader=0x3ffd4be0) at ../../py/persistentcode.c:339
#4  load_raw_code (reader=0x3ffd4be0, qw=0x3ffd4b70) at ../../py/persistentcode.c:381
#5  0x400df9ac in load_raw_code (reader=0x3ffd4be0, qw=0x3ffd4b70) at ../../py/persistentcode.c:492
#6  0x400df9ac in load_raw_code (reader=0x3ffd4be0, qw=0x3ffd4b70) at ../../py/persistentcode.c:492
#7  0x400dfb21 in mp_raw_code_load (reader=0x3ffd4be0) at ../../py/persistentcode.c:567
#8  0x400dfb41 in mp_raw_code_load_file (filename=0x3ffd4c40 "view.mpy") at ../../py/persistentcode.c:583
#9  0x400eb3b0 in do_load (module_obj=0x3f817ab0, file=<optimized out>) at ../../py/builtinimport.c:216
#10 0x400eb722 in mp_builtin___import__ (n_args=<optimized out>, args=<optimized out>) at ../../py/builtinimport.c:449
So I think what happening is that during import, the bytecode is scanned for strings and each of them is converted into a qstr. The qstr pools are searched sequentially in a for-loop, for each string in the bytecode.
Since the pool is big (pool->len is 4450) it's very inefficient and takes very long (probably many cache misses).

On frozen code, on the other hand, strings are converted to qstrs during compile time so this str-to-qstr conversion is not performed at all during import, and everything is smooth and fast.

Now - how to solve this?

Some ideas:
  • More efficient pool search. Hash table for example.
  • In my case most of the strings don't come from the mpy modules, but from other builtin modules (LVGL library probably).
    The number of strings that are unique to the mpy modules is relatively small.
    So maybe a different qstr pool for "dynamic" qstrs that come from mpy files, which would be much smaller and faster to search.
  • Maybe just change the search order in qstr_find_strn such that new strings (that come from the mpy files) come first and not last, assuming that strings that are known on compile time were already converted to qstr on the bytecode when compiling the mpy.
Other thoughts?

agonnen
Posts: 27
Joined: Sat Oct 13, 2018 7:52 pm

Re: Compiled modules load time

Post by agonnen » Sat Feb 13, 2021 10:56 pm

Update - sorting qstr and using binary search improve things significantly!

I've created a PR: https://github.com/micropython/micropython/pull/6896

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: Compiled modules load time

Post by jimmo » Mon Feb 15, 2021 2:28 am

agonnen wrote:
Sat Feb 13, 2021 10:56 pm
Update - sorting qstr and using binary search improve things significantly!

I've created a PR: https://github.com/micropython/micropython/pull/6896
Thanks, nice work! And great results!

It appears that the code size increase is quite small (and it may be possible to make this optional anyway).

Post Reply