Rolling dices on gc_mark_subtree !?

C programming, build, interpreter/VM.
Target audience: MicroPython Developers.
Post Reply
arnon
Posts: 4
Joined: Thu May 20, 2021 8:53 am

Rolling dices on gc_mark_subtree !?

Post by arnon » Thu May 20, 2021 9:06 am

Hi,
test code is:
a = [x for x in range(120)]
a = [x for x in range(120)]
gc.collect()

I've seen that gc_mark_subtree runs on the elements of 'a' looking for a pointer ("if (VERIFY_PTR(ptr)) {").
On random values of 'a' (or any other value sources) element may pass the test though its not a pointer - that's a a BUG !
Same goes for numpy arrays etc.

Any thoughts ?

stijn
Posts: 735
Joined: Thu Apr 24, 2014 9:13 am

Re: Rolling dices on gc_mark_subtree !?

Post by stijn » Thu May 20, 2021 11:10 am

Why do you think that is a bug? As far as I'm aware there's nothing in the C standard which prohibits read operations on arbitrary values. Also none of the values of the integer objects (these aren't plain C ints!) in a will return true for VERIFY_PTR.

arnon
Posts: 4
Joined: Thu May 20, 2021 8:53 am

Re: Rolling dices on gc_mark_subtree !?

Post by arnon » Thu May 20, 2021 1:21 pm

So here's another code which demonstrates the bug:
import gc
f = open('c:\\wavs\\tone 2kHz.wav', 'rb')
x = f.read(64)
x = f.read(64)
gc.collect()

and the partial log from running gc_mark_subtree (with additional notes):

(1) 008E0590 - 4 x INT32
(2) 002B6D88 00005567 00000040 008E0540 <-- (pointer was found on 008E0540)
(3) 008E0540 - 20 x INT32
(4) 20C03F44 DF400001 A687C0BC 85C8926C 85C88179 A688926B DF40C0BD 20C00000
(5) 59793F44 7A386D94 7A387E88 59786D95 20C03F44 DF410000 A687C0BC 85C7926C

The data printed on lines (4) and (5) is the raw data from my WAV file which may hold any values,
still it's being scanned and some values may pass "VERIFY_PTR".
Please correct me if I'm wrong....


My partial code, I just added few printf's, the rest is the same.

void gc_mark_subtree(size_t block) {
// Start with the block passed in the argument.
size_t sp = 0;
for (;;) {
// work out number of consecutive blocks in the chain starting with this one
size_t n_blocks = 0;
do {
n_blocks += 1;
} while (ATB_GET_KIND(block + n_blocks) == AT_TAIL);

// check this block's children
void **ptrs = (void **)PTR_FROM_BLOCK(block);
printf("\n%p - %d x INT32\n", ptrs, n_blocks * BYTES_PER_BLOCK / sizeof(void *)); ccc = 0;
for (size_t i = n_blocks * BYTES_PER_BLOCK / sizeof(void *); i > 0; i--, ptrs++) {
void *ptr = *ptrs;
printf("%p ", ptr);
if (!(++ccc & 7)) printf("\n");

if (VERIFY_PTR(ptr)) {
// Mark and push this pointer
printf("<-- ");

stijn
Posts: 735
Joined: Thu Apr 24, 2014 9:13 am

Re: Rolling dices on gc_mark_subtree !?

Post by stijn » Thu May 20, 2021 3:22 pm

Sorry my question wasn't clear; I mean: are you sure this is a bug causing actual problems, as opposed to being by design? I honestly don't know which it is, I'm just assuming the latter because it seems to make sense; could be that i'm missing something but worst-case would be that this arbitrary value passes VERIFY_PTR and is AT_HEAD and the gc would mark that pointer which might not be correct in case the block is actually not in use anymore. In which case it won't get sweeped. But the likelihood seems pretty small at first sight and the adverse effect (less free memory) not too severe.

Christian Walther
Posts: 169
Joined: Fri Aug 19, 2016 11:55 am

Re: Rolling dices on gc_mark_subtree !?

Post by Christian Walther » Thu May 20, 2021 3:37 pm

This is not a bug, false positives in pointer detection are expected and, as stijn mentions, do not affect the correctness of GC operation. Watch jimmo’s talk at https://melbournemicropythonmeetup.gith ... 20-Meetup/ for more details about how the garbage collector works.

stijn
Posts: 735
Joined: Thu Apr 24, 2014 9:13 am

Re: Rolling dices on gc_mark_subtree !?

Post by stijn » Thu May 20, 2021 7:22 pm

Thanks for confirming. And for that link, didn't all that info gets saved.

arnon
Posts: 4
Joined: Thu May 20, 2021 8:53 am

Re: Rolling dices on gc_mark_subtree !?

Post by arnon » Sat May 22, 2021 5:10 am

Thanks Christian for your answer.
I think the way GC works right now contradicts the concept of being "lean and efficient" and the spirit of realtime imbedded programming.
My plan is to add a "GC marking" method to objects so it would not relay on excessive search, while maintaining backwards compatibility.
Do you know if someone tried it before ?

stijn
Posts: 735
Joined: Thu Apr 24, 2014 9:13 am

Re: Rolling dices on gc_mark_subtree !?

Post by stijn » Sat May 22, 2021 6:29 am

I think the way GC works right now contradicts the concept of being "lean and efficient"
That depends on your definition of these. In light of MicroPython the focus is on only features which are needed and small code size, les on performance, to be able to run on small microcontrollers. This GC implementation does that exactly.
and the spirit of realtime imbedded programming.
Only if you don't take care about dealing with the GC (as in: preallocate as much as possible, run gc.collect() in idle time, ...). Not 100% sure what you mean with 'GC marking' but it likely isn't going to change much in that regard because scanning will still take a relatively long time in comparison with other operations. Perhaps just less long than what is implemented now, but it will cost valueable memory because all objects get larger, and as said earlier that is one of MicroPython's key features.

What could work wrt performace are things like separate GC areas e.g. being able to allocate pieces which then do not get scanned at all because you know on beforehand they shouldn't get sweeped. This and other ideas can be found on https://github.com/micropython/micropython/ (look for 'garbage collection' etc)

arnon
Posts: 4
Joined: Thu May 20, 2021 8:53 am

Re: Rolling dices on gc_mark_subtree !?

Post by arnon » Sat May 22, 2021 1:32 pm

I understand there's a tradeoff between object size's and time to complete GC but from my point of view (I integrate MP to "DSP group" chips) scanning raw data during GC seems like something I'd like to avoid, and as a new feature I can put it under compilation flag.
I think MP can be improved much farther, for example I've modified "gc_sweep" to run x6 faster (I'm also improving ULAB).

Post Reply