Lock causing intermittent crash

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
Post Reply
mgag
Posts: 5
Joined: Sat Aug 04, 2018 4:07 pm

Lock causing intermittent crash

Post by mgag » Thu Oct 10, 2019 9:39 pm

I thought I was "doing the right thing" by protecting a variable with a lock, but I have discovered that using the lock is causing a hard crash. To create a small snippet to demonstrate the issue, I started with code from here.

I am using MicroPython,

Code: Select all

>>> os.uname()
(sysname='pyboard', nodename='pyboard', release='1.11.0', version='v1.11 on 2019-05-29', machine='PYBv1.1 with STM32F405RG')
The MicroPython code looks like this,

Code: Select all

import pyb, micropython
import _thread

micropython.alloc_emergency_exception_buf(100)


class Foo():

    def __init__(self):
        self.lock = _thread.allocate_lock()
        self.bar_ref = self.bar  # Allocation occurs here
        tim = pyb.Timer(4)
        tim.init(freq=1)
        tim.callback(self.cb)

    def _add(self, item):
        # accessed by 'REPL' and via Timer callback thru micropython.schedule()
        with self.lock:
            i = 0
            while i < 10000:  # the longer this is the quicker fault occurs
                i += 1

    def bar(self, _):  # internally called by the scheduler via the Timer, this is not ISR context
        self._add(23)

    def cb(self, t):
        # Passing self.bar would cause allocation.
        micropython.schedule(self.bar_ref, 0)

    def ret(self, method=None, all=False):  # externally called by the PC via REPL
        print("something {}".format(method))
        self._add(method)
        return True


foo = Foo()
And PC (Ubuntu 18, python 3.6) sode code like this,

Code: Select all

from time import sleep
import ampy.pyboard as pyboard

pyb = pyboard.Pyboard("/dev/ttyACM0")
pyb.enter_raw_repl()

data, data_err = pyb.exec_raw("import test_05\n")
print(data, data_err)

for i in range(10000):
    data, data_err = pyb.exec_raw("test_05.foo.ret(method='bar', all=False)\n", timeout=10, data_consumer=None)
    print(i, data, data_err)
    sleep(0.1)
The code is exercising access to the function `_add()` by two different "threads" (bad use of the term here). `_add()` is accessed by the locally running Timer, and asynchronously by the PC via REPL. This is why I added the Lock, to protect access to this function.

My findings:
1) The use of `with self.lock:` in the function `_add()` triggers the fault. Without the lock, the fault is not triggered. Or maybe it does, but it takes a lot longer.
2) The longer the function `_add()` takes, the easier it is to create the hard fault.

As you can guess from the snippet, in the real code, I have a variable (a list) that is accessed by the Timer function and by the PC (via REPL). In the snippet above, I removed that variable because its not the issue. The snippet doesn't create or access shared variables.

I believe the snippet is following the rules outlined in the link above.

Am I doing something wrong?

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Lock causing intermittent crash

Post by pythoncoder » Fri Oct 11, 2019 6:07 am

I'm not sure it's valid to use the lock object provided by _thread in this context as you're not actually using threads. You could implement your own mutex as a bound variable. But I don't think a mutex is required here as there is no actual concurrency: from the docs for micropython.schedule():
A scheduled function will never preempt another scheduled function.
Peter Hinch
Index to my micropython libraries.

mgag
Posts: 5
Joined: Sat Aug 04, 2018 4:07 pm

Re: Lock causing intermittent crash

Post by mgag » Fri Oct 11, 2019 5:42 pm

Thank you for the response. In my project, I am using one thread, but it doesn't access the shared function/variable. I didn't realise Lock wasn't for the general case (without threads). I always try to make a minimal code snippet to repro an issue, and I found that the thread I was creating didn't affect the crash.

I was worried about the REPL (from external PC polling the PyBoard), and the internal Timer (updating the state of a variable), so I added that lock. I removed the lock, and from limited testing the last 24 hours, I didn't see an issue. I don't understand what the REPL looks like to a running thread on the MicroPython - I mean, is the REPL another thread? How does the REPL get scheduled relative to currently running threads? Does the REPL "interrupt" a currently running thread?

BTW, for the PyBoard STM32, I have often got corrupted filesystem, but, if I always hold reset button when I remove power from the PyBoard, I have yet to get a corrupted filesystem. Maybe this will help someone having the same issue - I have seen a few posts about it.

And finally, there is the emergency buffer, is there a way to read that buffer? Or, is there a recipe for debugging crashes to give more information that would be helpful to core developers?

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Lock causing intermittent crash

Post by pythoncoder » Sat Oct 12, 2019 7:47 am

Have you considered using uasyncio instead of threading? Unless there is a specific reason for threading, uasyncio is smaller, more efficient, faster and easier to debug.

Re file corruption this can occur if you enable mass storage mode because the USB mass storage spec assumes that the device is a dumb disk drive rather than a controller capable of modifying the disk's contents. I always disable MSC mode. I won't say corruption never occurs: it might after a crash while writing to the filesystem. But it's vanishingly rare.
Peter Hinch
Index to my micropython libraries.

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: Lock causing intermittent crash

Post by jimmo » Sun Oct 13, 2019 10:51 pm

mgag wrote:
Fri Oct 11, 2019 5:42 pm
I was worried about the REPL (from external PC polling the PyBoard), and the internal Timer (updating the state of a variable), so I added that lock. I removed the lock, and from limited testing the last 24 hours, I didn't see an issue. I don't understand what the REPL looks like to a running thread on the MicroPython - I mean, is the REPL another thread? How does the REPL get scheduled relative to currently running threads? Does the REPL "interrupt" a currently running thread?
The REPL runs on the "main" thread, which always exists. If you imagine MicroPython as just a library that executes snippets of Python code, it looks something like this:

Code: Select all

def main():
  run_file('boot.py')
  run_file('main.py')
  while True:
    line = read_line()
    print(exec(line))
So if you start another thread, then that while loop will run concurrently with your thread. So yes, the REPL does interrupt a currently running thread (at a very low level).

Peter referenced micropython.schedule -- at various points in a thread's execution in the MicroPython VM, it will check for scheduled tasks (a good example is when the REPL would otherwise block waiting for input, as well as various points in the program flow). So this interrupts your Python code. But the main thread can't be both running your regular Python code _and_ a scheduled function at the same time, the whole scheduled function runs to completion before returning back to whatever it was doing before.

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: Lock causing intermittent crash

Post by jimmo » Sun Oct 13, 2019 10:53 pm

mgag wrote:
Fri Oct 11, 2019 5:42 pm
And finally, there is the emergency buffer, is there a way to read that buffer? Or, is there a recipe for debugging crashes to give more information that would be helpful to core developers?
I assume you're referring to the emergency exception buffer? This just gives a little bit of pre-allocated space to create the execption and the message for the exception. For many exceptions, it requires a sprintf() to generate the text of the message (e.g. "couldn't convert %s to %s") so you need some buffer for sprintf to write to. The emergency exception buf is just that, and you shouldn't ever need to actually see the contents directly because you'll see the actual exception that was allocated inside it.

Post Reply