GC and Heap

C programming, build, interpreter/VM.
Target audience: MicroPython Developers.
jickster
Posts: 629
Joined: Thu Sep 07, 2017 8:57 pm

GC and Heap

Post by jickster » Wed Jun 06, 2018 1:47 pm

You can have infinite nested Python executing Python. The state is saved on stack so it’s only limited by stack size how many nested Py-calls-Py you have.

When I get to a PC, I’ll show you the line of code that implements this.

Sent from my iPhone using Tapatalk

jickster
Posts: 629
Joined: Thu Sep 07, 2017 8:57 pm

GC and Heap

Post by jickster » Wed Jun 06, 2018 2:16 pm

I gave you the 5 conditions necessary to satisfy if you want to save/restore the heap. If you think you can satisfy them, problem solved.


There is another way to minimize the amount of heap you need to allocate and it’s related to garbage collection. The more often you execute garbage collection, the less heap fragmentation you’ll have. Why should you care? Heap-fragmentation is actually a huge cause of out of memory errors.

To execute gc the script (or REPL) has to contain calls to gc.collect() but there is another way that doesn’t require the user to periodically issue these calls.

How timing sensitive is your REPL interface? On my cortex M4 it takes 0.6ms (worst-case) for gc with 4K of heap. If you’re ok with having each REPL line having an overhead, you could call the C implementation gc_collect() after every single REPL line is finished executing.

It’s not guaranteed to fix all fragmentation issues but definitely most.


There is one issue with only calling gc_collect() once per REPL line: joining multiple REPL statements into one line with semicolon. In this case, especially when looping, the heap could become irreparably fragmented if you only gc_collrct() once per REPL line since one “line” could be an entire script and it’s not possible to detect multi-statement lines in the compiled bytecode code.

If you’re really paranoid, you can call gc_collect() after every single bytecode is executed. There’s also a less drastic choice: gc_collect() on every jump (see vm.c) in addition to once per REPL line.

Even if user executed one statement per line, You still COULD have issues with heap fragmentation: a Py function call that resolves to a C-module function call can have a C for-loop that performs many heap operations - alloc and free - that can result in a fragmented heap. I don’t think you have to be worried about builtin C modules doing this, just C-modules that a third party may write.

jickster
Posts: 629
Joined: Thu Sep 07, 2017 8:57 pm

GC and Heap

Post by jickster » Wed Jun 06, 2018 8:03 pm

mp_stack_ctrl_init() only sets the top of the stack so that stack usage can be calculated.
You have to also set stack size limit (see stackctrl.c).

If you don’t want to use uPy builtin stack check, this function is irrelevant because you’d set the macro to 0 and it’d get ignored.

If you do want stack check, you’re doing the right thing calling it each time you call a uPy function because you have no idea of the call depth of it.

jickster
Posts: 629
Joined: Thu Sep 07, 2017 8:57 pm

Re: GC and Heap

Post by jickster » Mon Jun 18, 2018 8:02 pm

Cyrille de Brébisson wrote:
Mon Jun 04, 2018 9:06 am
ping. updates?

cefn
Posts: 230
Joined: Tue Aug 09, 2016 10:58 am

Re: GC and Heap

Post by cefn » Mon Jun 18, 2018 11:58 pm

Isn't there a different architectural approach which could deliver what you want without tinkering under the hood of Micropython?

For example, there's first class support for continuations in the form of generators. Can't yield give you what you want, in the sense of resuming at some application state in a convenient manner?

jickster
Posts: 629
Joined: Thu Sep 07, 2017 8:57 pm

GC and Heap

Post by jickster » Tue Jun 19, 2018 2:39 am

cefn wrote:Isn't there a different architectural approach which could deliver what you want without tinkering under the hood of Micropython?

For example, there's first class support for continuations in the form of generators. Can't yield give you what you want, in the sense of resuming at some application state in a convenient manner?
If you want to switch between multiple Python executables ie at the Python-level, that would work.

But he wants to switch between Python and non-Python executables ie between OS processes.

Cyrille de Brébisson
Posts: 8
Joined: Wed May 30, 2018 7:46 am

Re: GC and Heap

Post by Cyrille de Brébisson » Tue Jun 26, 2018 11:10 am

Hello,

>If you want to switch between multiple Python executables ie at the Python-level, that would work.
>But he wants to switch between Python and non-Python executables ie between OS processes.

Not exactly.

I only have one single process. As a mater of fact, I only have one single thread.

However, I have multiple programming languages (all interpreted), but of main interest here is PPL.
PPL is very well integrated with the system. It can access and control system variables and functions/functionalities.
It can also be used to create/add new functionalities to the system. A program or variable created in PPL will in essence enhance the system, adding to it, and be undistiguishable from the original system functions.
If you have ever worked with SmallTalk, it is a similar system.

Now, I am adding Python. And Python needs to "play nice" and integrate nicely with the system.
I could have Python be an independent, "side program". Like it is on a PC...
But this would be very limited. It would mean that if someone has already created something in PPL, it can not be reused in Python. It would also mean that I would need to recreate python version of all the existing system stuff (including a full Computer Algebra System, graphing functions...) I am sure that all this already exists for python, should I try to find all of them, they still would not merge with the rest of the system. They would forever be "a side item".

So, I need to make sure that Python can call to PPL and PPL can call to python.
That when PPL sets a context (local variables for example) calls to Python and that Python tries to access this local variable, it works.
I also need to make sure that if PPL calls Python, which calls to PPL which itself calls back to Python, it still works!

Of course, they are memory allocations to think about also. Since I need to pre-allocate the heap for Python, how many such recursive calls can I make before it breaks from memory issues because the Python heap is 1MB?

Cyrille

jickster
Posts: 629
Joined: Thu Sep 07, 2017 8:57 pm

Re: GC and Heap

Post by jickster » Wed Jun 27, 2018 6:01 pm

Cyrille de Brébisson wrote:
Tue Jun 26, 2018 11:10 am
So, I need to make sure that Python can call to PPL and PPL can call to python.
That when PPL sets a context (local variables for example) calls to Python and that Python tries to access this local variable, it works.
I also need to make sure that if PPL calls Python, which calls to PPL which itself calls back to Python, it still works!

Of course, they are memory allocations to think about also. Since I need to pre-allocate the heap for Python, how many such recursive calls can I make before it breaks from memory issues because the Python heap is 1MB?

Cyrille
I'm not clear entirely on the environment and interaction between PPL and Python.

Please address these questions:
1. Can I assume that you're "starting in" REPL and then you will call a Python function that is implemented in C that will then call your PPL API?

2. Once you're in PPL and you want to call Python, are you going to call Python by calling the C-implementation directly or by creating a valid string expression and executing it?

Cyrille de Brébisson
Posts: 8
Joined: Wed May 30, 2018 7:46 am

Re: GC and Heap

Post by Cyrille de Brébisson » Thu Jun 28, 2018 10:50 am

Hello,

>1. Can I assume that you're "starting in" REPL and then you will call a Python function that is implemented in C that will then call your PPL API?

The starting point could be either the REPL, or a PPL program which will call the Python VM (spawning a Python VM with a text string to parse/execute).
They are 2 ways to call python from PPL:
python("program_string", n strings arguements passed in argv); as an instruction in the PPL program.
But you can also declare a python "function" by doing:
#python name(args)
python code
#end
Then calling name(args); in PPL will cause a python(program string, args); instruction to be generated.
This allows to call a python program from PPL as if it was a regular PPL function.

From Python, I will need to create multiple ways to interract with the PPL based system. But the most "complex" case would be an "exec" type function, similar to the way PPL calls python. Let us say:
PPL.eval("PPL expression", n arguements (recognized types would be transformed into the PPL equivalent types, the other transformed into strings)).
If I was to create a PPL object type on python, I could use it to bypass the string stage!
One of the issue is that PPL is a symbolic manipulation language. It can handle mathematical expressions like 'A+SIN(B)' where A and B are local variables.
Example of use (PPL "programs" can contain PPL code (the default), but also python code and another language!). #tyep directive allows to handle the switches.

// assuming that derivative of expression>0 for varaible in [guess1..guess2] (ie: expression keeps increasing)
// and that there is a value of variable so that expression(variable)=0
// find this value
#python solve(expression, variable, guess1, guess2) // This is a PPL file syntax to include some python code. There are no file system
while True:
test= (guess1+guess2)/2.0 // get a value between the 2 guesses (binary divide)
PPL.store(variable, test) // store it in PPL world
res= PPL.eval(expression) // eval the input expression
if res==0: // found the 0
return test
if res<0: // cut right or left
guess1= test
else
guess2= test
#end

// This is a PPL "exported" function (meaning that it is accessible from outside the program. A system global if you wish).
export test_function()
begin
local a= 2, b, c=1; // Creates 3 local variables
solve('sin(a)+cos(b)=c', b, 4,6); // This will call the pythong VM with the text from
// the #python solve line to the #end and the 4 arguements as sttings.
// note that if the python code tries evaluating the expression, I need to locate the local variables!
end;

Of course this could be implemented in other ways, but this is just there as an example of the type of interractions that might happen.



>2. Once you're in PPL and you want to call Python, are you going to call Python by calling the C-implementation directly or by creating a valid string expression and executing it?

At the moment, I am going through string representations.
When the user does python(program, parguements), the PPL interpreter (in C) will call execute_from_lexer after having initialized the VM and created argv.
Are they other ways that I could use to do it?
Among other things, I would like to avoid having to transform the arguements into strings. I mean PPL does have object types and they could be directly passed to python using the python equivalent types..

cyrille

stijn
Posts: 735
Joined: Thu Apr 24, 2014 9:13 am

Re: GC and Heap

Post by stijn » Thu Jun 28, 2018 2:56 pm

Cyrille de Brébisson wrote:
Thu Jun 28, 2018 10:50 am
At the moment, I am going through string representations.
When the user does python(program, parguements), the PPL interpreter (in C) will call execute_from_lexer after having initialized the VM and created argv.
Are they other ways that I could use to do it?
Among other things, I would like to avoid having to transform the arguements into strings. I mean PPL does have object types and they could be directly passed to python using the python equivalent types..
You could create functions converting your PPL objects to uPy objects (e.g. if you have an integer in C, you create a uPy object out of it using mp_obj_new_int(myInt)) and then make those accessible in the python script scope (*). That way you can also register C functions etc by the way. To get them in scope either use mp_store_global or alternatively create a module which can then get imported in the python script (using mp_obj_new_module, mp_module_register). The latter is in my opinion the nicest and safest (no name clashes, clear scope distinction, no linters complaining about unknown global variables, ...)

(*) note the script could also write that integer, getting that value back into PPL is another thing to do, if needed. Can be avoided using functions to get PPL variables. Anyway, a couple of possibilities, hard to tell without seeing an example use case.

Post Reply