/unix/seg_helpers.c

C programming, build, interpreter/VM.
Target audience: MicroPython Developers.
Post Reply
p-i-
Posts: 20
Joined: Sun Sep 14, 2014 2:24 pm

/unix/seg_helpers.c

Post by p-i- » Tue Sep 23, 2014 4:02 pm

This file contains the following:

Code: Select all

/*
  This is a stub used to create the symbols __bss_start and _end in a Mach-O object file.
  Thoses are needed by the GC, and should point to the start and end of the bss section.
  We reach this goal by linking this file last (putting _end at the end...), and using an 
  order file (order.def) to move __bss_start at the start of bss.

  TODO: Some pragma to do it inline ?
*/

char __bss_start = 0;
char _end = 0;
I can't make head or tail of that comment. Could someone explain it in a way that is understandable to someone who is familiar with coding C but not with memory management / garbage collection / compiler internals?

π

blmorris
Posts: 348
Joined: Fri May 02, 2014 3:43 pm
Location: Massachusetts, USA

Re: /unix/seg_helpers.c

Post by blmorris » Tue Sep 23, 2014 4:37 pm

It's needed for the unix port to build on OSX; I don't know the details, but somehow this provides the OSX toolchain some of the information it needs to link into the low-level assembler stuff used by the garbage collector. When gcc builds the unix port of linux this bit isn't necessary, but clang needs this extra bit for some reason. (Doesn't answer your question about gc internals, but I thought this bit might not be necessary for your project. I only stumbled across this file while tweaking the unix Makefile to work for OSX earlier today.)

-Bryan

User avatar
dhylands
Posts: 3821
Joined: Mon Jan 06, 2014 6:08 pm
Location: Peachland, BC, Canada
Contact:

Re: /unix/seg_helpers.c

Post by dhylands » Wed Sep 24, 2014 1:11 am

As far as what test, data, and bss are, here's a web page that describes them: http://mcuoneclipse.com/2013/04/14/text ... explained/

stijn
Posts: 735
Joined: Thu Apr 24, 2014 9:13 am

Re: /unix/seg_helpers.c

Post by stijn » Wed Sep 24, 2014 7:51 am

BSS_START and _end (declared in gccollect.c) must point to the start and end of the bss section which is used for static data. When scanning for objects to clean, the gc must also look at this section otherwise it might clean memory that is actually still in use: for instance a statically allocated map might contain a pointer to something allocated elsewhere. If the gc wouldn't scan the static map, it would find no pointers to the object and it would think the object pointed to isn't referenced anywhere and hence mark it for deletion. Which causes undefined behaviour if the map still references the object afterwards. (this is maybe a bit oversimplified, but it's the basic principle). I posted a couple of links a while ago in the development section with nice info on GC. And there's a bunch of github issues for it as well in which more is explained.

Now, gcc itself allows you to get the bss start and end pointers to predefined __bss_start__ and _end symbols. Other compilers (clang, msvc) don't have these so another mechanisms to be used to find out where the section starts and ends. In this case, for OSX, this is done by declaring two variables and making sure __bss_start is the very first static variable in the executable and _end is the last one hereby mimicking gcc's behavior. As such, the address of __bss_start is the start of the section and the address of _end is the end of it. Look in the generated map file for confirmation. How this is done is specified in the comment.

p-i-
Posts: 20
Joined: Sun Sep 14, 2014 2:24 pm

Re: /unix/seg_helpers.c

Post by p-i- » Wed Sep 24, 2014 2:01 pm

Thanks for the explanations.

I can see this is awkward in terms of creating a generic build.

My goal is still to be able to drop a wodge of C code into some C/C++project (on unspecified platform) and instantly have embedded Python.

I think this could eventually go in a /generic/ folder

But this issue seems to be a clear case of requiring separate machinery for non-gcc e.g. clang.

Is there any way around this?

I'm getting a little confused about Python-land vs C-land. This BSS_START to _end -- is this the space in which our C program stores its static data? Or is it where the embedded Python runtime stores it's static data? Or both?

Would it be possible to get round this problem by (as dhylands suggested on IRC) manually keeping track of every external (C-land reference to a Python object), and then just throwing an array of these memory locations into the garbage collector?

Also, this garbage collector -- it is just for the Python runtime, yes? Or is it also being used to clean up unused C memory?

blmorris
Posts: 348
Joined: Fri May 02, 2014 3:43 pm
Location: Massachusetts, USA

Re: /unix/seg_helpers.c

Post by blmorris » Wed Sep 24, 2014 3:22 pm

Keep in mind that the particular technique used for garbage collection in Micro Python (mark and sweep) was chosen for its relative simplicity and deterministic performance in tightly constrained systems - initially a micro controller with 192kB of RAM (the STM32F405) and scalable to other systems with even less memory (Dave Hyland's Teensy port, for example). I imagine that it was important for the unix port to have the same gc implementation so that Python code destined for micro controllers could be developed in a unix environment that was as similar to the micro controller as possible.
This ability to run in tight spaces has also sparked some interest in running the unix port on embedded linux systems (wifi routers, etc).
On the other hand, I do recall some discussion, probably on Github but I don't know exactly where, suggesting that the current gc could be replaced if circumstances call for it (possibly with reference counting as used in CPython); one reason being that mark and sweep doesn't scale well to larger memory spaces (though this problem may have been partly resolved).
In other words, depending on the goals of your project there may be a better way to deal with memory management than Micro Python's current gc system.
-Bryan

User avatar
dhylands
Posts: 3821
Joined: Mon Jan 06, 2014 6:08 pm
Location: Peachland, BC, Canada
Contact:

Re: /unix/seg_helpers.c

Post by dhylands » Wed Sep 24, 2014 3:33 pm

p-i- wrote:Thanks for the explanations.

I can see this is awkward in terms of creating a generic build.

My goal is still to be able to drop a wodge of C code into some C/C++project (on unspecified platform) and instantly have embedded Python.

I think this could eventually go in a /generic/ folder

But this issue seems to be a clear case of requiring separate machinery for non-gcc e.g. clang.

Is there any way around this?
If there were we would have used it.
p-i- wrote:I'm getting a little confused about Python-land vs C-land. This BSS_START to _end -- is this the space in which our C program stores its static data? Or is it where the embedded Python runtime stores it's static data? Or both?
BSS is where C stores uninitialized (or initialzed to zero) global data. DATA is where C stores initialized global data.

An example of this would be:
https://github.com/micropython/micropyt ... cept.c#L72
or
https://github.com/micropython/micropyt ... .c#L49-L51

These C variables may point to python objects from the heap, and the GC needs to know that the pointers exist so that it doesn't free the objects from the heap.
p-i- wrote:Would it be possible to get round this problem by (as dhylands suggested on IRC) manually keeping track of every external (C-land reference to a Python object), and then just throwing an array of these memory locations into the garbage collector?
That's essentially what's happening when you throw the contents of BSS at the garbage collector. In the more generic case (embedding micropython inside another program), the other program will have its own heap, and that heap may contain pointers to python objects. The GC needs to know about those, but it doesn't know where to look to find them.
p-i- wrote:Also, this garbage collector -- it is just for the Python runtime, yes? Or is it also being used to clean up unused C memory?
It's for the GC managed heap. If C code happens to allocate from it, then it covers that case as well. In MicroPython, there is no heap, other than the GC managed one, so all heap allocations come from the GC managed heap. For example:
https://github.com/micropython/micropyt ... int.c#L171

p-i-
Posts: 20
Joined: Sun Sep 14, 2014 2:24 pm

Re: /unix/seg_helpers.c

Post by p-i- » Wed Sep 24, 2014 5:21 pm

I seem to have hit rock, just because I can't get BSS start and end from generic C-code.

I'm interested in the idea of implementing a reference counting system. If there is any existing discussion on this, please could someone link it? I can't find it...

The idea might be to have a preprocessor flag that determines whether the GC uses the current "mark and sweep" or reference counting.

Is all of this GC machinery only for the purpose of collecting Python objects?

How practical would it be to implement such a system?

Could someone outline the basic approach?

I am keen to dig into it.

stijn
Posts: 735
Joined: Thu Apr 24, 2014 9:13 am

Re: /unix/seg_helpers.c

Post by stijn » Wed Sep 24, 2014 6:32 pm

p-i- wrote:Is there any way around this?
I know you're not looking for it, but using a library fixes most of your problems. Moreover a library has it's own bss section (as far as I know) which answers your other question: normally only the static data from micropython needs to be scanned, it's of no use to the gc to scan sections of the whole program.

Would it be possible to get round this problem by (as dhylands suggested on IRC) manually keeping track of every external (C-land reference to a Python object), and then just throwing an array of these memory locations into the garbage collector?
You would indeed need something like that. Part of this can be done automatically: create a micropython list and store it in the global micropython dict. Everything stored in the list will then get scanned by the gc so you only have to take care of putting your stuff in the list. Also make sure if you create a custom class and add functions to it (locals_dict member of the type object) to store that in the list as well otherwise it might get sweepd, I've had problems with that myself
The idea might be to have a preprocessor flag that determines whether the GC uses the current "mark and sweep" or reference counting.
That would be nice but could also lead to tons of #ifdef all over the place. Not sure how much work reference counting requires.

Post Reply