Length of identifier names and RAM usage

C programming, build, interpreter/VM.
Target audience: MicroPython Developers.
Post Reply
User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Length of identifier names and RAM usage

Post by pythoncoder » Thu Aug 05, 2021 5:09 am

This is perhaps academic but from reading the docs and watching one of Damien's presentations I had formed the idea that a worthwhile optimisation was to use short names (and to re-use them). The docs indicate that names no longer than 10 characters get interned as qstrs. So I did this test.

I recently ported a fair sized module from CPython to MicroPython. It has a good number of module-level functions with long names starting with an underscore, e.g.

Code: Select all

def _unpack_integer(code, fp):
    # code
I checked the memory usage of a demo which runs this code calling a good number of these functions. I then modified the code to shorten all function names to < 10 chars, e.g. in this case _u_int. I couldn't measure any difference in RAM use (running on a Pyboard 1.1).

My approach was to soft reset the Pyboard and issue

Code: Select all

import gc
gc.collect()
print(gc.mem_free())
import asyntest  # Demo code
The demo runs a task which does the same collect-report sequence every 10s.

I appreciate this is rather theoretical but I am interested in knowing what optimisations are worthwhile and why.
Peter Hinch
Index to my micropython libraries.

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: Length of identifier names and RAM usage

Post by jimmo » Thu Aug 05, 2021 5:57 am

pythoncoder wrote:
Thu Aug 05, 2021 5:09 am
I had formed the idea that a worthwhile optimisation was to use short names (and to re-use them).
This was my understanding too :)
pythoncoder wrote:
Thu Aug 05, 2021 5:09 am
The docs indicate that names no longer than 10 characters get interned as qstrs.
This specifically applies to literal strings. Anything used as an identifier will always become a QSTR regardless of length.

(Surprisingly this applies not just to globals but also locals -- the parser/compiler needs to track them, and then has no way to free them afterwards because they may be dupes).
pythoncoder wrote:
Thu Aug 05, 2021 5:09 am
I then modified the code to shorten all function names to < 10 chars, e.g. in this case _u_int. I couldn't measure any difference in RAM use (running on a Pyboard 1.1).
This is suprising to me but I don't have a good explanation.

Probably the best diagnostic is to run

Code: Select all

micropython.qstr_info(True)
and see if there is a difference in the amount of ram used for interned string data.

QSTR data is stored in two places -- the pools (which are a linked list of pointers to the start of the strings), and the actual packed string data (which are stored in chunks allocated on the heap). The pool usage is only proportional to the number of strings. Chunks are allocated by either growing the previous chunk by the required amount, or starting a new chunk. New chunks start at 128 bytes.

So it's possible that fragmentation in the chunks is leading to the same total memory usage... but this seems like an amazing coincidence.

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: Length of identifier names and RAM usage

Post by jimmo » Thu Aug 05, 2021 6:15 am

As a simple example:

Code: Select all

import gc
import micropython

aaaaaaaaaaaaaaaaa1 = True
aaaaaaaaaaaaaaaaa2 = True
aaaaaaaaaaaaaaaaa3 = True
aaaaaaaaaaaaaaaaa4 = True
aaaaaaaaaaaaaaaaa5 = True
aaaaaaaaaaaaaaaaa6 = True
aaaaaaaaaaaaaaaaa7 = True
aaaaaaaaaaaaaaaaa8 = True

gc.collect()
print(gc.mem_free())
micropython.qstr_info(True)
This will print (on the Linux port)

Code: Select all

2071712
qstr pool: n_pool=1, n_qstr=11, n_str_data_bytes=250, n_total_bytes=442
If I remove one "a" from each of those names, it will print

Code: Select all

2071840
qstr pool: n_pool=1, n_qstr=11, n_str_data_bytes=242, n_total_bytes=434
but then any subsequent shortening will still be mem_free=2071840.

So I guess if the total amount of bytes that you saved by shortening the function names was less than 128 bytes (and the conditions were right in the way the chunks were laid out), then it's possible you would see no difference.

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Length of identifier names and RAM usage

Post by pythoncoder » Fri Aug 06, 2021 8:13 am

Thank you for the clarification. I'll hazard a guess that in my test the savings were lost in the noise in the context of the total RAM usage of my code.
Peter Hinch
Index to my micropython libraries.

Post Reply