[workaround used] esp8266 gets stuck periodically

All ESP8266 boards running MicroPython.
Official boards are the Adafruit Huzzah and Feather boards.
Target audience: MicroPython users with an ESP8266 board.
kevinkk525
Posts: 969
Joined: Sat Feb 03, 2018 7:02 pm

Re: Asyncio gets stuck periodically

Post by kevinkk525 » Thu May 10, 2018 8:29 am

Is 0x1000 seconds something special in the firmware? I can't get a correct downtime as i'm using the mqtt last wish publication and my mqtt client has a timeout of 60 seconds. According to that the microcontroller was offline for 1h 10min 31sec but of course there is a reconnection of the WLAN in there which also takes some time so 1hr 8.27 minutes is quite possible.
Kevin Köck
Micropython Smarthome Firmware (with Home-Assistant integration): https://github.com/kevinkk525/pysmartnode

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Asyncio gets stuck periodically

Post by pythoncoder » Thu May 10, 2018 9:19 am

No, it seems an unlikely value to be used in code. I'm pretty sure it's just coincidence.
Peter Hinch
Index to my micropython libraries.

kevinkk525
Posts: 969
Joined: Sat Feb 03, 2018 7:02 pm

Re: Asyncio gets stuck periodically

Post by kevinkk525 » Tue May 22, 2018 8:28 am

After over 1 week of no problems, the first of my mC went offline for 1h 10 minutes again. I changed the uasyncio library to report every execution of a coroutine that takes longer than a few seconds.
My log was empty though, except for the message that the execution of coroutines was stopped for at least 1h.
This means that there is no problem with any of my coroutines (including the mqtt library of @pythoncoder).

This leaves me only with 2 possibilities:
1) asyncio gets stuck somewhere
2) esp-idf has a bug stopping execution (but not timers) for a certain amount of time

I don't know how to check those 2 though.. And more interestingly I seem to be the only one having this problem.
Kevin Köck
Micropython Smarthome Firmware (with Home-Assistant integration): https://github.com/kevinkk525/pysmartnode

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Asyncio gets stuck periodically

Post by pythoncoder » Tue May 22, 2018 9:09 am

For what it's worth I have a high degree of confidence in uasyncio: I've been using it quite intensively for some time and only found one issue, fixed by @pfalcon long ago. His work is top-quality. I'm familiar with the code and can't imagine a mechanism for this long delay. Of course that doesn't mean there isn't one, but I think the odds are against it.

One way to tackle this is to write very simple test cases, but inevitably testing is going to take some time. You might start with an absolutely minimal case: ignore the WiFi, flash an LED and test for an outage. Then gradually increase capability, e.g. accessing some local resource on WiFi. Hopefully you'll identify what causes the breakage.
Peter Hinch
Index to my micropython libraries.

kevinkk525
Posts: 969
Joined: Sat Feb 03, 2018 7:02 pm

Re: Asyncio gets stuck periodically

Post by kevinkk525 » Tue May 22, 2018 10:34 am

I'm actually trust uasyncio too. There is not much to the code and the used modules should not have any long delays as it's only a deque and utimeq. I don't use any streams of uasyncio either so there is not much to cause problems.

One thing I just remembered is that these outages occured more often with my old router and more often depending on the amount of connected devices. But it's not a reconnection problem as this wouldn't prevent coroutines from executing (except if there's an error in esp-idf or hardware). Sockets can't be the problem as I would have detected these with the changes in uasyncio.

Therefore the problem most likely resides either in esp-idf or in a hardware problem and I doubt that it's cause by something in micropython.

Testing small parts of the hardware with simple tests is going to be a long and painful adventure.. One of my controllers is now working over 3 weeks without an outage. There is no way to tell how long until an outage occurs.
Sadly there's no WDT for esp8266 so I can't just reset the board if it doesn't feed the watchdog every second. But I could check if interrupts are being processed during an outage.
Guess I'll have to put up a battery of esp8266 for testing somewhere I don't get annoyed by blinking leds :D

Thanks for your answer @pythoncoder
Kevin Köck
Micropython Smarthome Firmware (with Home-Assistant integration): https://github.com/kevinkk525/pysmartnode

User avatar
marfis
Posts: 215
Joined: Fri Oct 31, 2014 10:29 am
Location: Zurich / Switzerland

Re: Asyncio gets stuck periodically

Post by marfis » Wed May 23, 2018 5:18 am

if there is no hw wdt available you could use one of the hw timer with an irq callback setup to trigger e.g. every 1sec.

in the callback you could increase a global counter variable and do a machine.reset() if the counter exceeds a predefined value.

in one of your foreground coros you can zero this counter, eq to a wdt feed.

its not a perfect replacement for a hw wdt, but close enough in most cases.

User avatar
marfis
Posts: 215
Joined: Fri Oct 31, 2014 10:29 am
Location: Zurich / Switzerland

Re: Asyncio gets stuck periodically

Post by marfis » Wed May 23, 2018 5:22 am

that said - this works best on bare metal ports.

the esp rtos might place a sw layer/delay in between the timer‘s hw irq time and the actual python callback invocation

kevinkk525
Posts: 969
Joined: Sat Feb 03, 2018 7:02 pm

Re: Asyncio gets stuck periodically

Post by kevinkk525 » Wed May 23, 2018 6:56 am

In my current test I included an irq that logs the time of execution and a coroutine that checks the time between the last 2 interrupt executions and if they are not right, logs it to mqtt. That way I'll know if interrupts also get stuck or if I can use it as a watchdog.

I guess I could have set it up as a watchdog in the first place as I see the reboots.. did not think about that. Thanks for the hint.
If that works, I'll just use this watchdog instead of searching for the problem..
Kevin Köck
Micropython Smarthome Firmware (with Home-Assistant integration): https://github.com/kevinkk525/pysmartnode

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Asyncio gets stuck periodically

Post by pythoncoder » Wed May 23, 2018 7:03 am

@marfis Good idea. That would demonstrate whether the lockup is at the Python level or in the RTOS - if the latter, presumably the timer would be stuck. It would potentially save time if the code reported each time it (re)started.
Peter Hinch
Index to my micropython libraries.

kevinkk525
Posts: 969
Joined: Sat Feb 03, 2018 7:02 pm

Re: Asyncio gets stuck periodically

Post by kevinkk525 » Wed May 23, 2018 7:51 am

That's how my test is designed as it tells if the interrupt was not executed during the outage and logs it to mqtt.

The wdt could save a restart to a file and check it after boot but my esp8266 do not have a filesystem. But I'll see it as they request a configuration after boot which is logged. And the wdt should trigger after a period longer than mqtt ping time so that the controller is shortly shown as offline. This is of course very specific to my use case. I think generally the wdt could write it to a file and read it when started again and then log it to mqtt.
Kevin Köck
Micropython Smarthome Firmware (with Home-Assistant integration): https://github.com/kevinkk525/pysmartnode

Post Reply