Page 1 of 1
Rolling dices on gc_mark_subtree !?
Posted: Thu May 20, 2021 9:06 am
by arnon
Hi,
test code is:
a = [x for x in range(120)]
a = [x for x in range(120)]
gc.collect()
I've seen that gc_mark_subtree runs on the elements of 'a' looking for a pointer ("if (VERIFY_PTR(ptr)) {").
On random values of 'a' (or any other value sources) element may pass the test though its not a pointer - that's a a BUG !
Same goes for numpy arrays etc.
Any thoughts ?
Re: Rolling dices on gc_mark_subtree !?
Posted: Thu May 20, 2021 11:10 am
by stijn
Why do you think that is a bug? As far as I'm aware there's nothing in the C standard which prohibits read operations on arbitrary values. Also none of the values of the integer objects (these aren't plain C ints!) in a will return true for VERIFY_PTR.
Re: Rolling dices on gc_mark_subtree !?
Posted: Thu May 20, 2021 1:21 pm
by arnon
So here's another code which demonstrates the bug:
import gc
f = open('c:\\wavs\\tone 2kHz.wav', 'rb')
x = f.read(64)
x = f.read(64)
gc.collect()
and the partial log from running gc_mark_subtree (with additional notes):
(1) 008E0590 - 4 x INT32
(2) 002B6D88 00005567 00000040 008E0540 <-- (pointer was found on 008E0540)
(3) 008E0540 - 20 x INT32
(4) 20C03F44 DF400001 A687C0BC 85C8926C 85C88179 A688926B DF40C0BD 20C00000
(5) 59793F44 7A386D94 7A387E88 59786D95 20C03F44 DF410000 A687C0BC 85C7926C
The data printed on lines (4) and (5) is the raw data from my WAV file which may hold any values,
still it's being scanned and some values may pass "VERIFY_PTR".
Please correct me if I'm wrong....
My partial code, I just added few printf's, the rest is the same.
void gc_mark_subtree(size_t block) {
// Start with the block passed in the argument.
size_t sp = 0;
for (;;) {
// work out number of consecutive blocks in the chain starting with this one
size_t n_blocks = 0;
do {
n_blocks += 1;
} while (ATB_GET_KIND(block + n_blocks) == AT_TAIL);
// check this block's children
void **ptrs = (void **)PTR_FROM_BLOCK(block);
printf("\n%p - %d x INT32\n", ptrs, n_blocks * BYTES_PER_BLOCK / sizeof(void *)); ccc = 0;
for (size_t i = n_blocks * BYTES_PER_BLOCK / sizeof(void *); i > 0; i--, ptrs++) {
void *ptr = *ptrs;
printf("%p ", ptr);
if (!(++ccc & 7)) printf("\n");
if (VERIFY_PTR(ptr)) {
// Mark and push this pointer
printf("<-- ");
Re: Rolling dices on gc_mark_subtree !?
Posted: Thu May 20, 2021 3:22 pm
by stijn
Sorry my question wasn't clear; I mean: are you sure this is a bug causing actual problems, as opposed to being by design? I honestly don't know which it is, I'm just assuming the latter because it seems to make sense; could be that i'm missing something but worst-case would be that this arbitrary value passes VERIFY_PTR and is AT_HEAD and the gc would mark that pointer which might not be correct in case the block is actually not in use anymore. In which case it won't get sweeped. But the likelihood seems pretty small at first sight and the adverse effect (less free memory) not too severe.
Re: Rolling dices on gc_mark_subtree !?
Posted: Thu May 20, 2021 3:37 pm
by Christian Walther
This is not a bug, false positives in pointer detection are expected and, as stijn mentions, do not affect the correctness of GC operation. Watch jimmo’s talk at
https://melbournemicropythonmeetup.gith ... 20-Meetup/ for more details about how the garbage collector works.
Re: Rolling dices on gc_mark_subtree !?
Posted: Thu May 20, 2021 7:22 pm
by stijn
Thanks for confirming. And for that link, didn't all that info gets saved.
Re: Rolling dices on gc_mark_subtree !?
Posted: Sat May 22, 2021 5:10 am
by arnon
Thanks Christian for your answer.
I think the way GC works right now contradicts the concept of being "lean and efficient" and the spirit of realtime imbedded programming.
My plan is to add a "GC marking" method to objects so it would not relay on excessive search, while maintaining backwards compatibility.
Do you know if someone tried it before ?
Re: Rolling dices on gc_mark_subtree !?
Posted: Sat May 22, 2021 6:29 am
by stijn
I think the way GC works right now contradicts the concept of being "lean and efficient"
That depends on your definition of these. In light of MicroPython the focus is on only features which are needed and small code size, les on performance, to be able to run on small microcontrollers. This GC implementation does that exactly.
and the spirit of realtime imbedded programming.
Only if you don't take care about dealing with the GC (as in: preallocate as much as possible, run gc.collect() in idle time, ...). Not 100% sure what you mean with 'GC marking' but it likely isn't going to change much in that regard because scanning will still take a relatively long time in comparison with other operations. Perhaps just less long than what is implemented now, but it will cost valueable memory because all objects get larger, and as said earlier that is one of MicroPython's key features.
What could work wrt performace are things like separate GC areas e.g. being able to allocate pieces which then do not get scanned at all because you know on beforehand they shouldn't get sweeped. This and other ideas can be found on
https://github.com/micropython/micropython/ (look for 'garbage collection' etc)
Re: Rolling dices on gc_mark_subtree !?
Posted: Sat May 22, 2021 1:32 pm
by arnon
I understand there's a tradeoff between object size's and time to complete GC but from my point of view (I integrate MP to "DSP group" chips) scanning raw data during GC seems like something I'd like to avoid, and as a new feature I can put it under compilation flag.
I think MP can be improved much farther, for example I've modified "gc_sweep" to run x6 faster (I'm also improving ULAB).