[MIPS] Garbage collection crash during Unix port running on MIPS

Discussion and questions about boards that can run MicroPython but don't have a dedicated forum.
Target audience: Everyone interested in running MicroPython on other hardware.
avinashbh
Posts: 5
Joined: Mon May 03, 2021 3:02 pm

[MIPS] Garbage collection crash during Unix port running on MIPS

Post by avinashbh » Mon May 03, 2021 3:25 pm

Hi All,

This is my first post to the community. I'm pretty new to micropython and to the community.

I ran into issues while running micropython unix port on MIPS32. I have cross compiled micropython for a MIPS32 and running it alongwith openwrt.
The issue happens during garbage collection. When garbage collector runs, it probably marking some blocks as 'not referenced' which are still in use. The issue happens when the program tries to execute a callback stored before garbage collector runs. When the callback is called after garbage collection; it either ends in SIGSEGV or an object which is not callable.
I tried to look into the source code of unix port especially the functions which are run during garbage collection.
In unix port, gc_helper_collect_regs_and_stack() is called which reads the data from registers. In my compiled version, it calls the following version -

STATIC void gc_helper_get_regs(gc_helper_regs_t arr) {
setjmp(arr);
}

So my question is - should I fill the register array using specific registers or the setjmp() way of getting the register array is sufficient? In which scenarios, it can cause troubles?
Also any pointers hinting at where to look for when debugging garbage collection related crashes would be very useful
I can provide more information if needed

stijn
Posts: 735
Joined: Thu Apr 24, 2014 9:13 am

Re: [MIPS] Garbage collection crash during Unix port running on MIPS

Post by stijn » Mon May 03, 2021 3:56 pm

Can you show the code which reproduces the problem? Or otherwise provide some information like what kind of callback you mean? The setjmp implementation should be fine, I think. I mean the whole point of setjmp is to capture registers.

avinashbh
Posts: 5
Joined: Mon May 03, 2021 3:02 pm

Re: [MIPS] Garbage collection crash during Unix port running on MIPS

Post by avinashbh » Mon May 03, 2021 7:34 pm

I have stitched together the files.
The code below is the bare minimum code which crashes. I have marked point of crash as "A", this is the place where callback is being actually called; and "B" is the actual callback.
Let me know if something is not clear. Thanks in advance!

Code: Select all

#####################util.py######################
def _return_must_satisfy(condition_check: function, func, args, errorType):
	#A. point of callback crash
    ret = func(*args)
    if condition_check(ret):
        return ret
    raise errorType

def must_return(val, func: function, errorType=AssertionError):
    def wrapper(*args):
        return _return_must_satisfy(lambda x: x is val, func, args, errorType)

    return wrapper
	
################## mosquitto.py #######################
import ctypes
import ffi
import sys
import struct
import util

_mosquitto = ffi.open("libmosquitto.so")

MOSQ_ERR_SUCCESS = 0

INT_TYPE = ctypes.UINT64 if _int_size == 8 else ctypes.UINT32

_mosquitto_message_t = {
    "mid": 0 * _int_size | INT_TYPE,
    "topic": 1 * _int_size | INT_TYPE,
    "payload": 2 * _int_size | INT_TYPE,
    "payloadlen": 3 * _int_size | INT_TYPE,
    "qos": 4 * _int_size | INT_TYPE,
    "retain": 5 * _int_size | INT_TYPE
}

class MqttError(Exception):
    pass

#  int mosquitto_lib_init(void)
_mosquitto_lib_init = must_return(MOSQ_ERR_SUCCESS, _mosquitto.func("i", "mosquitto_lib_init", ""), MqttError)

_mosquitto_lib_init()  # start the library right away

# struct mosquitto *mosquitto_new(const char *id, bool clean_session, void *obj)
mosquitto_new = must_not_return(None, _mosquitto.func("p", "mosquitto_new", "sip"), MqttError)

# int mosquitto_connect(struct mosquitto *mosq, const char *host, int port, int keepalive)
mosquitto_connect = must_return(MOSQ_ERR_SUCCESS, _mosquitto.func("i", "mosquitto_connect", "psii"), MqttError)

# int mosquitto_subscribe(struct mosquitto *mosq, int *mid, const char *sub, int qos)
mosquitto_subscribe = must_return(MOSQ_ERR_SUCCESS, _mosquitto.func("i", "mosquitto_subscribe", "ppsi"), MqttError)


# int mosquitto_loop(struct mosquitto *mosq, int timeout, int max_packets)
mosquitto_loop = must_return(MOSQ_ERR_SUCCESS, _mosquitto.func("i", "mosquitto_loop", "pii"), MqttError)


# void mosquitto_message_callback_set(struct mosquitto *mosq, void
# (*on_message)(struct mosquitto *, void *, const struct mosquitto_message
# *))
_mosquitto_message_callback_set = _mosquitto.func("v", "mosquitto_message_callback_set", "pC")


def mosquitto_message_callback_set(mosq, cb):
    wrapped_cb = ffi.callback("v", cb, "ppp")
    _mosquitto_message_callback_set(mosq, wrapped_cb)

    return wrapped_cb # this must be stored somewhere to avoid garbage collection

# int mosquitto_topic_matches_sub(const char *sub, const char *topic, bool *result)
mosquitto_topic_matches_sub = must_return(MOSQ_ERR_SUCCESS, _mosquitto.func("i", "mosquitto_topic_matches_sub", "ssp"),
                                          MqttError)

#########################driver code###################
from mosquitto import *
import gc
import time


class Mqtt:
    def __init__(self, clientid, username=None, password=None, hostname="localhost", port=1883, keepalive=60):
        self._mqtt = mosquitto_new(clientid, 1, None)
        # return value must be stored to avoid it being garbage collected
        self._msg_cb = mosquitto_message_callback_set(self._mqtt, self._on_message)
        mosquitto_connect(self._mqtt, hostname, port, keepalive)
        mosquitto_subscribe(self._mqtt, None, "test_crash", 0)
        self._count = 0
        print("running")
        while True:
            time.sleep_ms(0)
            r = mosquitto_loop(self._mqtt, 1, 1)
            if r:
                print("there was an error", r)
				
	#B. Callback		
    def _on_message(self, mosq, ctx, msg):
        self._count += 1
        if not self._count % 10:
            print("still alive, {} heap size {}".format(self._count, gc.mem_alloc()))


m = Mqtt("test_crash", "crash", "pwd")

stijn
Posts: 735
Joined: Thu Apr 24, 2014 9:13 am

Re: [MIPS] Garbage collection crash during Unix port running on MIPS

Post by stijn » Tue May 04, 2021 7:31 am

Very clear, thanks, but I don't have MQTT etc installed so didn't actually try it; have you tried running this on the unix port on Ubuntu or similar? Eventually also using the stjmp implementation. Would be interesting to test: if there are no problems, it could be that setjmp on MIPS is the problem. Othereise it is likely that the problem is in the mixing of C / MicroPython callbacks. Plus if that reproduces the problem it might be easier to debug. Wrt to that: I'd start by figuring out if the object created when self._on_message is passed as an argument to ffi.callback doesn't get collected.

avinashbh
Posts: 5
Joined: Mon May 03, 2021 3:02 pm

Re: [MIPS] Garbage collection crash during Unix port running on MIPS

Post by avinashbh » Tue May 04, 2021 8:33 am

So unix port on Ubuntu works without any problem.

stijn
Posts: 735
Joined: Thu Apr 24, 2014 9:13 am

Re: [MIPS] Garbage collection crash during Unix port running on MIPS

Post by stijn » Tue May 04, 2021 12:56 pm

Also when built with CFLAGS_EXTRA='-DMICROPY_GCREGS_SETJMP=1'? That would probably mean (as you hinted already) you'd have to provide a gc_helper_get_regs implementations which saves the correct registers.

avinashbh
Posts: 5
Joined: Mon May 03, 2021 3:02 pm

Re: [MIPS] Garbage collection crash during Unix port running on MIPS

Post by avinashbh » Thu May 06, 2021 7:05 am

The peculiar thing about the issue is that it doesn't appear on intel cpu. I also managed to get a minimal working example which doesn't require mosquitto.

Code: Select all

import ffi
import uctypes
import random
import gc
gc.threshold(50 * 1024) # make it crash faster
try:
    libc = ffi.open("libc.so.1")
except:
    libc = ffi.open("libc.so.6")
qsort = libc.func("v", "qsort", "piiC")
class ClassCmp:
    def cmp(self, pa, pb):
        a = uctypes.bytearray_at(pa, 1)
        b = uctypes.bytearray_at(pb, 1)
        return a[0] - b[0]
c = ClassCmp()
cmp_cb = ffi.callback("i", c.cmp, "pp")
# instead, store c.cmp so it doesn't get collected: no crash
# k = c.cmp
# cmp_cb = ffi.callback("i", k, "pp")
s = b"foobar"
while True:
    qsort(s, len(s), 1, cmp_cb)
    a = b"5" * 1021 # use some memory, eventually overwriting c.cmp passed to cmp_cb
It can be seen that the 2nd parameter passed to ffi.callback() is the one which gets garbage collected. If the commented line is uncommented; it doesn't crash.
What is intriguing is that the same object doesn't get collected when the unix port is run on Ubuntu running on Intel.

stijn
Posts: 735
Joined: Thu Apr 24, 2014 9:13 am

Re: [MIPS] Garbage collection crash during Unix port running on MIPS

Post by stijn » Thu May 06, 2021 10:45 am

Yeah that's what I suspected already. Thanks for the repro code btw. I am not sure but I think this is a bug in MicroPython's FFI code and that it is a coincidence that the unix port doesn't collect the callback (e.g. because it happens to still have the callback on the stack or so i.e. gets seen in the registers, and/or that FFI on MIPS behaves different than on x86 or x64, etc).

Again, not sure, but the thing is the callback function impementation, mod_ffi_callback() in modffi.c, takes the MicroPython callback and then passes it into FFI without storing it anywhere else. And since I don't immediately see anything which guarantees that what FFI allocates is reachable by the GC, it's perfectly normal that callback gets GC'd becaue the pointer is only stored as a void* somewhere in a piece of memory allocated by FFI (hence, I assume, from the C heap).

Could you try this workaround as well: in modffi.c in the definition of mp_obj_fficallback_t, add an mp_obj_t for explicitly storing the callback:

Code: Select all

typedef struct _mp_obj_fficallback_t {
    mp_obj_base_t base;
    void *func;
    mp_obj_t py_func;  //add this one
    ffi_closure *clo;
    char rettype;
    ffi_cif cif;
    ffi_type *params[];
} mp_obj_fficallback_t;
then in mod_ffi_callback assign py_func:

Code: Select all

...
mp_obj_fficallback_t *o = m_new_obj_var(mp_obj_fficallback_t, ffi_type *, nparams);
o->py_func = func_in;
...
which will make sure the callback gets explicitly stored in this object so the GC should see it ans will not collect it.

Sidenote: the FFI code calls ffi_closure_alloc but never ffi_closure_free, unless I'm missing something that is a memory leak.

avinashbh
Posts: 5
Joined: Mon May 03, 2021 3:02 pm

Re: [MIPS] Garbage collection crash during Unix port running on MIPS

Post by avinashbh » Mon May 17, 2021 12:01 pm

Your suggested change seems to work. At least within my test setup. Do you plan to implement this changes to unix port?

stijn
Posts: 735
Joined: Thu Apr 24, 2014 9:13 am

Re: [MIPS] Garbage collection crash during Unix port running on MIPS

Post by stijn » Mon May 17, 2021 1:17 pm

Thanks for the feedback.
Do you plan to implement this changes to unix port?
Probably, unless someone can point out that it is actually a problem with setjmp or so. I'd really like to figure out how it is possible the unix port running on an x64 Ubuntu doesn't have the same problem. I reported this here: https://github.com/micropython/micropython/issues/7273

Post Reply