Simple scenario with button/IRQ/write to file that leads to ESP crash + FS coruption

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
Post Reply
cristian.spiescu
Posts: 7
Joined: Wed Aug 28, 2019 5:07 am

Simple scenario with button/IRQ/write to file that leads to ESP crash + FS coruption

Post by cristian.spiescu » Sat Aug 31, 2019 8:07 pm

Hello,

I have a very annoying problem that I discovered in the last phase of my project, when I replaced some "print()" with logging to a file. I have reduced the program to a very simple issue that illustrates the unwanted behavior:

1) I push a button
2) it's associated IRQ handler is triggered; I do "micropython.schedule()" for the actual handler
3) the actual handler writes something to a file

4) If I press the button a couple of times, then my ESP8266 reboots with the message:
NOTE: on the small program below, one needs to press rather quickly on the button in order to reproduce. On my big program, the issue was more immediate.

===========================================

ets Jan 8 2013,rst cause:2, boot mode:(3,7)

load 0x40100000, len 31024, room 16
tail 0
chksum 0xb4
load 0x3ffe8000, len 1104, room 8
tail 8
chksum 0x74
load 0x3ffe8450, len 824, room 0
tail 8
chksum 0x45
csum 0x45

===========================================

5) After reboot, a lot of strange entries appear on the file system:

===========================================
>>> import os
>>> os.listdir()
['\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', ...
===========================================

6) Some times the initial files still exist on the file system. But most of the time they disappear. And I need to reflash the ESP with the Micropython image.

A simple program that reproduces this behavior:

===========================================
import uos, machine
import gc
gc.collect()

import machine
import micropython

pinRobinetButon = machine.Pin(13, machine.Pin.IN, machine.Pin.PULL_UP)

import utime

def irqHandler(pin):
micropython.schedule(sch, None)

def sch(p):
d = utime.localtime();
str = "a text"
str = "{:02d}:{:02d}:{:02d} {}".format(d[3], d[4], d[5], str);
print(str)
with open("logfile.0", "a+") as f:
f.write(str)
f.write("\n")

pinRobinetButon.irq(irqHandler, machine.Pin.IRQ_FALLING)
===========================================

Is there something illegal that I'm doing in the above code? Or maybe have you got any other hints?

Thanks a lot in advance!
Best regards,
Cristian

User avatar
ghayne
Posts: 42
Joined: Sat Jun 08, 2019 9:31 am
Location: Cwmllynfell, Wales

Re: Simple scenario with button/IRQ/write to file that leads to ESP crash + FS coruption

Post by ghayne » Sun Sep 01, 2019 8:21 pm

def irqHandler(pin):
micropython.schedule(sch, None)

You are calling sch before it is defined!

JoeCharlieH
Posts: 8
Joined: Thu Dec 27, 2018 10:59 pm

Re: Simple scenario with button/IRQ/write to file that leads to ESP crash + FS coruption

Post by JoeCharlieH » Sun Sep 01, 2019 10:37 pm

Well, beside the response above, you shouldn't do write operations inside an IRQ.

What I usually do is to change a variable inside the IRQ and check if it is True or False inside your main loop:

Code: Select all


write_flag = False
def irqHandler(pin):
	global write_flag
	write_flag = True

def sch():
d = utime.localtime();
str = "a text"
str = "{:02d}:{:02d}:{:02d} {}".format(d[3], d[4], d[5], str);
print(str)
with open("logfile.0", "a+") as f:
	f.write(str)
	f.write("\n")

pinRobinetButon.irq(irqHandler, machine.Pin.IRQ_FALLING)

while True:
	if write_flag:
		irq_state = machine.disable_irq()
		write_flag = False
		sch()
		machine.enable_irq(irq_state)
Remember that IRQs must do fast operations that don't allocate memory.

More info in here.

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: Simple scenario with button/IRQ/write to file that leads to ESP crash + FS coruption

Post by jimmo » Mon Sep 02, 2019 2:48 am

ghayne wrote:
Sun Sep 01, 2019 8:21 pm
def irqHandler(pin):
micropython.schedule(sch, None)

You are calling sch before it is defined!
This is fine in Python. Think of it as the "module" executing first. It defines a global called "irqHandler", then a global called "sch".

Then later when irqHandler executes later, "sch" is in globals.
JoeCharlieH wrote:
Sun Sep 01, 2019 10:37 pm
Well, beside the response above, you shouldn't do write operations inside an IRQ.
...
Remember that IRQs must do fast operations that don't allocate memory.
This is true in general, but:
- on ESP8266 pin IRQs are "soft" (i.e. they execute by the scheduler)
- the code posted uses micropython.schedule anyway, so the write code is definitely actually happening outside of irq context.

So, on ESP8266 the schedule() is actually unnecessary. And on other ports you can set the "hard=False" kwarg to Pin.irq() to make this happen automatically.

The actual issue here is that when the pin IRQ is enabled, the handler for it (written in C, part of hte micropython firmware) is in code that lives in flash. So if you're currently writing to the flash, it fails. i.e. it doesn't matter what you do in the irq handler, just registering it is enough to cause this issue. The correct solution for this is for C irq handlers to be stored in iram, but this is a bit tricky for MicroPython because lots of code would then have to live in iram.
JoeCharlieH wrote:
Sun Sep 01, 2019 10:37 pm
What I usually do is to change a variable inside the IRQ and check if it is True or False inside your main loop:
This is also great advice in general, but unfortunately doesn't solve the issue here (your main loop could be writing when the interrupt fires).

So I think the only workaround you have is to disable irqs while writing to flash:

Code: Select all

def sch(x):
    s = machine.disable_irq()
    with open('logfile.0', 'a+') as f:
        f.write(...)
    machine.enable_irq(s)

cristian.spiescu
Posts: 7
Joined: Wed Aug 28, 2019 5:07 am

Re: Simple scenario with button/IRQ/write to file that leads to ESP crash + FS coruption

Post by cristian.spiescu » Mon Sep 02, 2019 5:07 am

Thank you all for the replies.
So, on ESP8266 the schedule() is actually unnecessary. And on other ports you can set the "hard=False" kwarg to Pin.irq() to make this happen automatically.
W/ and w/o the schedule(): I got the impression that there was a difference. I.e. w/o schedule() it crashed faster. But maybe this is just a subjective impression.
So I think the only workaround you have is to disable irqs while writing to flash:
I've tried to do that. I tried calling "machine.disable_irqs()" from REPL, after a fresh boot to see what it does; but the board reboots. This seems very strange.

cristian.spiescu
Posts: 7
Joined: Wed Aug 28, 2019 5:07 am

Re: Simple scenario with button/IRQ/write to file that leads to ESP crash + FS coruption

Post by cristian.spiescu » Mon Sep 02, 2019 5:17 am

UPDATE: calling "disable_irq()" messes somehow with the REPL + other things. The board doesn't accept my input for a few seconds, and then it reboots. Probably the watchdog timer.

BUT: using "disable_irq()" + "enable_irq()" in my code as suggested does the trick indeed.

Many thanks!

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: Simple scenario with button/IRQ/write to file that leads to ESP crash + FS coruption

Post by jimmo » Mon Sep 02, 2019 9:30 am

cristian.spiescu wrote:
Mon Sep 02, 2019 5:17 am
UPDATE: calling "disable_irq()" messes somehow with the REPL + other things. The board doesn't accept my input for a few seconds, and then it reboots. Probably the watchdog timer.
Yep, you need IRQs enabled for most things to function (i.e. the repl needs to get notified that new characters are available), and yep you could be right that a watchdog is what's getting you here. I don't know much about ESP8266...

You should only disable IRQs for very short periods of time.

Glad it worked!

Post Reply