Recovering from failure

All ESP8266 boards running MicroPython.
Official boards are the Adafruit Huzzah and Feather boards.
Target audience: MicroPython users with an ESP8266 board.
Post Reply
tony_c
Posts: 3
Joined: Sat Nov 18, 2017 3:44 pm

Recovering from failure

Post by tony_c » Sun Jun 24, 2018 2:10 am

My 8266 seems to hang after couple days. I couldn't replicate issue unless I put it in use (i.e. I cannot hook a computer to serial). I am wondering if there is any way to get a crash log? It doesn't seem that the problem was due to wifi as it seems to reconnect if I restart my router. Is it possible to have an external circuit to monitor failure (e.g. keep alive) and auto reset the device?

kevinkk525
Posts: 969
Joined: Sat Feb 03, 2018 7:02 pm

Re: Recovering from failure

Post by kevinkk525 » Sun Jun 24, 2018 8:00 am

I have a similar problem in this thread: viewtopic.php?f=16&t=4706&start=20
My esp8266 hangs after 1-2 weeks for 1h10 minutes, then recovers.

You could try a watchdog implementation (only software wdt using interrupts as the esp8266 does not have a hardware wdt):

Code: Select all

import gc
import uasyncio as asyncio
import machine
from pysmartnode.utils import sys_vars

gc.collect()
from pysmartnode import logging

log = logging.getLogger("WDT")


class WDT:
    def __init__(self, id=0, timeout=120):
        self._timeout = timeout / 10
        self._counter = 0
        self._timer = machine.Timer(id)
        self.init()
        asyncio.get_event_loop().create_task(self._resetCounter())
        if sys_vars.hasFilesystem():
            try:
                with open("watchdog.txt", "r") as f:
                    if f.read() == "True":
                        log.warn("Reset reason: Watchdog")
            except Exception as e:
                print(e)  # file probably just does not exist
            try:
                with open("watchdog.txt", "w") as f:
                    f.write("False")
            except Exception as e:
                log.error("Error saving to file: {!s}".format(e))

    def _wdt(self, t):
        self._counter += self._timeout
        if self._counter >= self._timeout * 10:
            if sys_vars.hasFilesystem():
                try:
                    with open("watchdog.txt", "w") as f:
                        f.write("True")
                except Exception as e:
                    print("Error saving to file: {!s}".format(e))
            machine.reset()

    def feed(self):
        self._counter = 0

    def init(self, timeout=None):
        timeout = timeout or self._timeout
        self._timeout = timeout
        self._timer.init(period=int(self._timeout * 1000), mode=machine.Timer.PERIODIC, callback=self._wdt)

    def deinit(self):  # will not stop coroutine
        self._timer.deinit()

    async def _resetCounter(self):
        while True:
            await asyncio.sleep(self._timeout)
            self.feed()
You can easily adapt it to your environment by removing the log and the pysmartnode references (and asyncio if you don't use it).
Kevin Köck
Micropython Smarthome Firmware (with Home-Assistant integration): https://github.com/kevinkk525/pysmartnode

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Recovering from failure

Post by pythoncoder » Sun Jun 24, 2018 10:53 am

For a software watchdog with uasyncio you could look at this module. This has a Delay_ms class. This will trigger a callback if it is not repeatedly triggered.

The ultimate solution is a hardware watchdog. A retriggerable monostable is repeatedly retriggered by a software-generated puls on a pin. It activates the hardware reset if it times out. The key advantage is that the system will recover from a total crash where the CPU stops running code.
Peter Hinch
Index to my micropython libraries.

Post Reply