[SOLVED] Crashing/freezing every few hours

All ESP8266 boards running MicroPython.
Official boards are the Adafruit Huzzah and Feather boards.
Target audience: MicroPython users with an ESP8266 board.
mkiotdev
Posts: 10
Joined: Sun Nov 11, 2018 9:12 am

[SOLVED] Crashing/freezing every few hours

Post by mkiotdev » Sun Nov 11, 2018 9:44 am

I'm working on prototype, ESP8266 (NodeMCU), MicroPython. I'm not new to software developement, microcontrollers etc. but am new to ESP and MicroPython. Previous implementation was on Raspberry PI, proven to be stable during course of 10 months. As Pi is quite an overhead; I'm now trying to use ESP8266; have decided on MicroPython, as it seems as some kind of middle ground.

main.py is ~300 lines of code currenty; it uses DS18B20 to read temepratures and control 8 relays accordingly through MCP23017 port expander. Other than that; every 60s HTTP POST with data is sent to server.

I do not know how to activate BBCode (or can't becaue of forume settings) so here is the main loop:

https://pastebin.com/G9AW2bLK

In the main loop, as RTC is not reliable; so i'm setting it through ntptime.settime() every hour. After that, there is call to urequests.request every 1 minute.

Everything is working fine; then after 8-10 hours, no obvious pattern; board is frozen/crashed. GPIO2 LED does not flash any more (see paste bin link) ; WEB REPL not accessible; if I use USB cable it seems board resets itself; and issue is resolved by reset.

How to debug this thing?

As for the solution, I'm thinking in the lines of:
- WTD; seems it is not implemented for ESP8266; have not tested; no-go?
- Memory leaks? Do gc.collect() every so ofthen.
- machine.reset every n hours?
- external WDT?
- use ESP32?

What do you think? Guys and gals with more experience ...
Last edited by mkiotdev on Wed Dec 26, 2018 1:54 pm, edited 1 time in total.

kevinkk525
Posts: 969
Joined: Sat Feb 03, 2018 7:02 pm

Re: Crashing/freezing every few hours

Post by kevinkk525 » Sun Nov 11, 2018 11:40 am

In another thread, ntptime synchronization could be blocking, others said that the power source could be a problem resulting in a freeze of the board.
In my own experience, all my esp8266 freeze every 1-3 days and need to be reset by a software watchdog using interrupts. This indicates that not everything is frozen but something is stuck and although I tried to debug it for several weeks, I could not find the reason for this and at that time I did not even use ntptime synchronization, only mqtt. But the more socket stuff I added, the more frequent the freezes were. Now some of my nodemcu get reset 2 times a day.
I stopped searching for the problem and just accepted that using a software watchdog solved the problem as good as it is possible. Of course I would be happy if someone finds the cause of this behaviour.

This is the WDT I'm using (actually an adapted version as I'm using uasyncio):

Code: Select all

import gc
import machine
from sys import platform
gc.collect()


class WDT:
    def __init__(self, id=0, timeout=120, use_rtc_memory=True):
        self._timeout = timeout / 10
        self._counter = 0
        self._timer = machine.Timer(id)
        self._use_rtc_memory = use_rtc_memory
        self.init()
        try:
            with open("watchdog.txt", "r") as f:
                if f.read() == "True":
                    print("Reset reason: Watchdog")
        except Exception as e:
            print(e)  # file probably just does not exist
        try:
            with open("watchdog.txt", "w") as f:
                f.write("False")
        except Exception as e:
            print("Error saving to file: {!s}".format(e))
            if use_rtc_memory and platform == "esp8266":
                rtc = machine.RTC()
                if rtc.memory() == b"WDT reset":
                    print("Reset reason: Watchdog")
                rtc.memory(b"")

    def _wdt(self, t):
        self._counter += self._timeout
        if self._counter >= self._timeout * 10:
            try:
                with open("watchdog.txt", "w") as f:
                    f.write("True")
            except Exception as e:
                print("Error saving to file: {!s}".format(e))
                if self._use_rtc_memory and platform == "esp8266":
                    rtc = machine.RTC()
                    rtc.memory(b"WDT reset")
            machine.reset()

    def feed(self):
        self._counter = 0

    def init(self, timeout=None):
        timeout = timeout or self._timeout
        self._timeout = timeout
        self._timer.init(period=int(self._timeout * 1000), mode=machine.Timer.PERIODIC, callback=self._wdt)

    def deinit(self):  # will not stop coroutine
        self._timer.deinit()
Use it like:

Code: Select all

wdt=WDT()
while True:
    time.sleep(1)
    wdt.feed()
Kevin Köck
Micropython Smarthome Firmware (with Home-Assistant integration): https://github.com/kevinkk525/pysmartnode

mkiotdev
Posts: 10
Joined: Sun Nov 11, 2018 9:12 am

Re: Crashing/freezing every few hours

Post by mkiotdev » Mon Nov 12, 2018 10:03 pm

Thank you Kevin!

I have been doing machine.reset every 2 hours, still got frozen/crashed device after about 10 hrs. Do not understand what am I doing wrong, if ESP8266 is so popular for IOT solutions, how come it is crashing after few hours of work and 60 HTTP POST requests per hour.

If I understand correctly, implementation of WDT that you have posted relies on timer which is, somehow, running and executes machine.resert in the condition when ESP8266 is crashed/forzen?

Documentation states that timers are "hardware timers", but not much more than that. So this 'hardware' timer has some kind of mechanism to run code when everythin else is not working.

Anybody, some thoughts?

jickster
Posts: 629
Joined: Thu Sep 07, 2017 8:57 pm

Re: Crashing/freezing every few hours

Post by jickster » Tue Nov 13, 2018 4:50 am

The garbage collector is not compacting.

As memory is allocated/freed over time, the memory becomes fragmented.

At some point, it is possible that an allocation will fail because there’s not a big enough chunk of contiguous heap.

How to detect this?
If you have access to C-debugger, you can put a breakpoint at the line where the “not enough memory” exception is thrown.

If you don’t have C-debugger, you can allocate the emergency buffer and then try-catch the exception.


Sent from my iPhone using Tapatalk Pro

mkiotdev
Posts: 10
Joined: Sun Nov 11, 2018 9:12 am

Re: Crashing/freezing every few hours

Post by mkiotdev » Tue Nov 13, 2018 8:11 am

I've thought that it is something like that; something with shortage of memory in some way. But, if this is the case; this is part/problem of MicroPython implementation? This does not make me appreciate MicroPython for ESP8266.

I am going to ask in the ESP32 subsection of forum is this also the case with; as ESP32 does have more memory; but throwing more memory at this wont resolve this issue, just delay the crash.

Do you have some more pointers? It seems currently that I should look further than MicroPython for ESP8266.

kevinkk525
Posts: 969
Joined: Sat Feb 03, 2018 7:02 pm

Re: Crashing/freezing every few hours

Post by kevinkk525 » Tue Nov 13, 2018 10:21 am

With such short programs I highly doubt that it has something to do with RAM fragmentation. Especially if the program does only allocate RAM temporarily and you call gc.collect() from time to time.

Many people don't have any problems with the esp8266 getting frozen. For a long time I seemed to be the only one having that problem.
Kevin Köck
Micropython Smarthome Firmware (with Home-Assistant integration): https://github.com/kevinkk525/pysmartnode

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Crashing/freezing every few hours

Post by pythoncoder » Tue Nov 13, 2018 10:54 am

These problems are often hardware related. In my testing of the resilient MQTT driver it ran for days near the limit of WiFi range without rebooting. This was on the reference hardware running from a USB power bank.

By contrast my attempts to get anything to run on the Sonoff WiFi switch failed with numerous crashes. I could take code which ran forever on the reference board, copy it to the Sonoff, and get total unreliability. I never figured out what was wrong with the hardware - testing with a meter and oscilloscope showed no evident issue. Perhaps there was some kind of RF issue which I lacked the means to measure.

The only conclusion I could draw was to be very picky about hardware including power supplies.
Peter Hinch
Index to my micropython libraries.

jickster
Posts: 629
Joined: Thu Sep 07, 2017 8:57 pm

Crashing/freezing every few hours

Post by jickster » Tue Nov 13, 2018 4:58 pm

mkiotdev wrote:I've thought that it is something like that; something with shortage of memory in some way. But, if this is the case; this is part/problem of MicroPython implementation? This does not make me appreciate MicroPython for ESP8266.

I am going to ask in the ESP32 subsection of forum is this also the case with; as ESP32 does have more memory; but throwing more memory at this wont resolve this issue, just delay the crash.

Do you have some more pointers? It seems currently that I should look further than MicroPython for ESP8266.
First verify that is the problem; I gave you two methods to do so.

If it is the problem, you can set the gc threshold to 0 so that after every allocation it performs a gc. This will result in greatly minimized fragmentation though it doesn’t guarantee you won’t ever have a problem; you’d need compaction for that to work.

That is drastic but if it fixes the issues AND you can tolerate the timing hit, problem solved.


Sent from my iPhone using Tapatalk Pro

mkiotdev
Posts: 10
Joined: Sun Nov 11, 2018 9:12 am

Re: Crashing/freezing every few hours

Post by mkiotdev » Wed Nov 14, 2018 9:51 pm

I’ve done some adjustments to code, more of gc.collect etc; more of the machine.reset and got full stable 24+ hours of work, everything fine. But, I have identified one part of problem.

Power supply. As PS is very close (on the same cable) to water pumps (low power; used for heating; about 100-150W altogether). Those water pumps do not need so much power; nevertheless moment of switching them on and off can be troublesome for my NodeMCU ESP8266 which froze instantly after swtching pump on-off two or three times in a row. Power supply I use is 240VAC to 5VDC 3A converter. Will try to find some better solution, any advice on that?

User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: Crashing/freezing every few hours

Post by Roberthh » Thu Nov 15, 2018 7:33 am

We have that in the lab too, that switching air compressor pumps affect the electronics, even if they are not connecvted to it at all. So we had to move them away. You could try to supply the ESP2866 with a separate power supply (separate from supply of the Relais you use to switch the pump). Two smaller power supply units would do it. If you have the chance, you can also try to run the esp8266 from a different AC supply line than the pump. That could give you indications on whether already spikes on the AC are the culprit.

Post Reply