Soft Watchdog

All ESP8266 boards running MicroPython.
Official boards are the Adafruit Huzzah and Feather boards.
Target audience: MicroPython users with an ESP8266 board.
Post Reply
User avatar
devnull
Posts: 473
Joined: Sat Jan 07, 2017 1:52 am
Location: Singapore / Cornwall
Contact:

Soft Watchdog

Post by devnull » Fri May 26, 2017 2:39 pm

What is wrong with this, it is as per the documentation: https://docs.micropython.org/en/latest/ ... e.WDT.html

Code: Select all

>>> import machine
>>> wdt = machine.WDT(timeout=2000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: function does not take keyword arguments
>>> 
This also does not work:

Code: Select all

wdt = machine.WDT(2000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError:
Last edited by devnull on Mon Oct 30, 2017 4:41 am, edited 1 time in total.

User avatar
devnull
Posts: 473
Joined: Sat Jan 07, 2017 1:52 am
Location: Singapore / Cornwall
Contact:

Re: WDT - TypeError: function does not take keyword arguments

Post by devnull » Sun May 28, 2017 12:42 am

OK, so it seems that the watchdog is not fully implemented on the esp8266 as it is used internally.

It appears that all you can do is trigger it by disabling interrupts, not sure how useful that would be.

I would like to implement a watchdog in some devices, especially where they are battery powered and 100% stand-alone and unmanned, just to guard against something unexpected happening and the device exits it's main loop and just sits there draining the battery power until it is no more.

So this is what I have come up with and I don't see any reason why this should not work, but are looking for any feedback, comments or suggestions:

Code: Select all

import machine as mc
class WDOG():

  def __init__(self):
    self.timer = mc.Timer(-1)
    self.fed = False

  def feed(self):
    self.fed = True  

  def wdcb(self):
    pass
  
  def wdtcb(self,tmr):
    if not self.fed: mc.reset()
    self.fed = False
    self.wdcb()
  
  def init(self,msec=5000):
    self.timer.init(period=msec, mode=mc.Timer.PERIODIC, callback=self.wdtcb)
    self.feed()


'''
import wdog, time
wd = wdog.WDOG()
wd.init(2000)
for i in range(10):  
  wd.feed()
  time.sleep(1)
  print('feeding')

''' 

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Software watchdog timers

Post by pythoncoder » Sun May 28, 2017 5:52 am

The problem with any software implementation of a WDT is that it is predicated on the assumption that the software is actually running ;) If the CPU crashes - i.e. goes into an infinite loop at the machine code level or executes the legendary HCF* opcode - the WDT will never trigger. In the case of the ESP* series there is an underlying OS: if that fails to allocate CPU time to the MicroPython VM the WDT will fail.

A true WDT needs to be implemented in hardware. That said, a software WDT is better than nothing.

* Halt and Catch Fire
Peter Hinch
Index to my micropython libraries.

User avatar
devnull
Posts: 473
Joined: Sat Jan 07, 2017 1:52 am
Location: Singapore / Cornwall
Contact:

Re: WDT - TypeError: function does not take keyword arguments

Post by devnull » Sun Oct 29, 2017 10:30 am

Need to pick this thread up again, desperately need a solution on the esp8266 which will prevent a situation with a headless, un-manned battery powered device that needs to be able to reboot itself if it crashes.

So it appears that if you disable irq, 10 seconds later the device will reset:

Code: Select all

import machine as mc
import time
while True:
	print('line')
	time.sleep(1)
	mc.disable_irq()
So the code continues to run (even though you can't interrupt it), If this time could be extended to say 1 minute, then that would solve the problem as when my device wakes up, it's only awake for 10 to 30 seconds, unless it crashes normally due to a dropped connection.

Any suggestions on how this could be achieved without the watchdog being fully implemented but using the disable_irq() instead ?

User avatar
devnull
Posts: 473
Joined: Sat Jan 07, 2017 1:52 am
Location: Singapore / Cornwall
Contact:

Re: WDT - TypeError: function does not take keyword arguments

Post by devnull » Mon Oct 30, 2017 1:23 am

OK, understand that no software watchdog is foolproof, but can this be made more 'robust' ??

lib/wdog.py

Code: Select all

import machine as mc
class WDOG():

  def __init__(self):
    self.timer = mc.Timer(-1)
    self.fed = False

  def feed(self):
    self.fed = True  

  def trig(self):
    mc.reset()
  
  def wdtcb(self,tmr):
    if not self.fed:
        self.deinit()
        self.trig()
    self.fed = False

  def deinit(self):
    self.timer.deinit()
		
  def init(self,msec=5000):
    self.fed = False
    self.timer.init(period=msec, mode=mc.Timer.PERIODIC, callback=self.wdtcb)
wdogtest.py:

Code: Select all

import wdog, time, machine
def ptrig():
	if reset:
	  machine.reset()
	print('triggered')
	
reset = False
wd = wdog.WDOG()
wd.trig = ptrig
wd.init(2000)
for i in range(10):  
  wd.feed()
  time.sleep(1)
  print('feeding')

reset = True
print('reset in 10 secs..')
wd.init(10000)

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Soft Watchdog

Post by pythoncoder » Mon Oct 30, 2017 8:19 am

I'm not sure much can be done to make it more robust. It will catch infinite loops but I doubt it would catch the case where the underlying RTOS locks up - for that you'd need a hardware timer. Does the ESP8266 have one?

I'd be interested to know why you're experiencing crashes. In my testing with recent versions of MicroPython, the ESP8266 can be resilient but it does require attention to detail at the application level. The problem, as you're doubtless aware, is that WiFi networks are not like wired ones. They can fail at any time owing to radio interference or loss of signal. This can cause blocking sockets to block indefinitely and protocols to fail. A blocking socket is merely doing what blocking sockets do: hanging until its operation is complete. This can be an arbitrarily long time if the AP has gone down or the ESP8266 has moved out of range. Of course timeouts can be used but it's hard to know how long to set these to. Code must be designed to cope with a timeout on all socket operations.

In my view the solution is to code using nonblocking sockets and uasyncio, and to ensure that protocols can never hang for indefinite periods. In testing my "resilient" asynchronous MQTT implementation I never encountered hangs despite extensive testing such as moving the unit slowly out of range of the AP and then gradually returning it. It has to be said that achieving this was non trivial.

So my suspicion is that your "crashes" may be caused by sockets blocking. If you are unwilling to go down the asynchronous route I'd suggest testing your software watchdog with a socket contrived to block indefinitely.

This is not to invalidate the idea of a watchdog. In my use of an ESP8266 to bring MQTT to the Pyboard, the Pyboard code can reset the ESP8266 if it fails. It's a normal precaution; but in testing its current incarnation this never actually occurred.
Peter Hinch
Index to my micropython libraries.

User avatar
devnull
Posts: 473
Joined: Sat Jan 07, 2017 1:52 am
Location: Singapore / Cornwall
Contact:

Re: Soft Watchdog

Post by devnull » Mon Oct 30, 2017 9:41 am

Peter, Thanks so much for taking the time to respond in such detail.

I've been using PIC micros for probably 20 years now, and still use them for anything that needs 100% reliability, low resources and little or no connectivity, other than via Serial etc, they simply never crash, but they don't have to deal with sockets etc.

I have a few 8266 devices that are deployed on the other side of the world, that are as many others use, clients that wake up periodically do something, report their data and then goto sleep again.

At the beginning of October, after 7 months, one of them suddenly came on line and drained all of it's battery after running for about 7 of the 9 months I had estimated.

But because this happens so infrequently, it is difficult to know what the cause is and so this is kind of like the last resort, at least there's something to make sure that it does not do what it just did again and drain all of it's battery down before it's due.

My suspicion is that it is socket related, and that the loss of connection at some critical stage results in the 'crash', it's a shame that you cannot have a global try/except that can handle any and all errors, but of course this still would not catch the serious crashes where the CPU halts !

Thanks again.

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Soft Watchdog

Post by pythoncoder » Tue Oct 31, 2017 5:12 am

Oh dear, the fault from hell :( I'd be interested to hear if your soft watchdog can enable recovery from a blocked socket; this might be useful to me.

I don't know if your application uses DNS but the current MicroPython implementation of getaddrinfo() blocks. In particular NTP access uses DNS to retrieve a timeserver from the pool. In my application this offers a potential source of blocking behaviour. Currently I work round this by checking for connectivity to a DNS server before proceeding, but this is a kludge. I believe work is in progress to implement nonblocking UDP and DNS requests.
Peter Hinch
Index to my micropython libraries.

Post Reply