Recovering from failure

tony_c · Post by **tony_c** » Sun Jun 24, 2018 2:10 am

My 8266 seems to hang after couple days. I couldn't replicate issue unless I put it in use (i.e. I cannot hook a computer to serial). I am wondering if there is any way to get a crash log? It doesn't seem that the problem was due to wifi as it seems to reconnect if I restart my router. Is it possible to have an external circuit to monitor failure (e.g. keep alive) and auto reset the device?

kevinkk525 · Post by **kevinkk525** » Sun Jun 24, 2018 8:00 am

I have a similar problem in this thread: viewtopic.php?f=16&t=4706&start=20
My esp8266 hangs after 1-2 weeks for 1h10 minutes, then recovers.

You could try a watchdog implementation (only software wdt using interrupts as the esp8266 does not have a hardware wdt):

Code: Select all

import gc
import uasyncio as asyncio
import machine
from pysmartnode.utils import sys_vars

gc.collect()
from pysmartnode import logging

log = logging.getLogger("WDT")


class WDT:
    def __init__(self, id=0, timeout=120):
        self._timeout = timeout / 10
        self._counter = 0
        self._timer = machine.Timer(id)
        self.init()
        asyncio.get_event_loop().create_task(self._resetCounter())
        if sys_vars.hasFilesystem():
            try:
                with open("watchdog.txt", "r") as f:
                    if f.read() == "True":
                        log.warn("Reset reason: Watchdog")
            except Exception as e:
                print(e)  # file probably just does not exist
            try:
                with open("watchdog.txt", "w") as f:
                    f.write("False")
            except Exception as e:
                log.error("Error saving to file: {!s}".format(e))

    def _wdt(self, t):
        self._counter += self._timeout
        if self._counter >= self._timeout * 10:
            if sys_vars.hasFilesystem():
                try:
                    with open("watchdog.txt", "w") as f:
                        f.write("True")
                except Exception as e:
                    print("Error saving to file: {!s}".format(e))
            machine.reset()

    def feed(self):
        self._counter = 0

    def init(self, timeout=None):
        timeout = timeout or self._timeout
        self._timeout = timeout
        self._timer.init(period=int(self._timeout * 1000), mode=machine.Timer.PERIODIC, callback=self._wdt)

    def deinit(self):  # will not stop coroutine
        self._timer.deinit()

    async def _resetCounter(self):
        while True:
            await asyncio.sleep(self._timeout)
            self.feed()

You can easily adapt it to your environment by removing the log and the pysmartnode references (and asyncio if you don't use it).

pythoncoder · Post by **pythoncoder** » Sun Jun 24, 2018 10:53 am

For a software watchdog with uasyncio you could look at this module. This has a Delay_ms class. This will trigger a callback if it is not repeatedly triggered.

The ultimate solution is a hardware watchdog. A retriggerable monostable is repeatedly retriggered by a software-generated puls on a pin. It activates the hardware reset if it times out. The key advantage is that the system will recover from a total crash where the CPU stops running code.

MicroPython Forum (Archive)

Recovering from failure

Recovering from failure

Re: Recovering from failure

Re: Recovering from failure