Writing robust network code

All ESP8266 boards running MicroPython.
Official boards are the Adafruit Huzzah and Feather boards.
Target audience: MicroPython users with an ESP8266 board.
Post Reply
User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Writing robust network code

Post by pythoncoder » Fri Nov 30, 2018 10:53 am

This forum is awash with people reporting crashes and LmacRxBlk errors. I have long been convinced that these come down to two causes, and have set out to demonstrate this. Causes are:
  • Defective hardware or power supplies.
  • Incorrect programming.
I have written a guide to how to write resilient networking applications which may be found here. The repo includes a demo which has clocked up a cumulative total of about 360 hours running over >100 WiFi outages without error.

tl:dr summary
The key is detecting and responding to WiFi outages. This necessitates periodically sending data in both directions through the socket and using a timeout at each end to detect loss of connectivity. Brief outages, unnoticed by devices with an OS, are very common and can lead to crashes unless applications are designed appropriately.

If I am correct it calls into question the resilience of most ESP8266 applications.

Comments and observations are extremely welcome.
Peter Hinch
Index to my micropython libraries.

kevinkk525
Posts: 969
Joined: Sat Feb 03, 2018 7:02 pm

Re: Writing robust network code

Post by kevinkk525 » Fri Nov 30, 2018 8:06 pm

Thanks a lot for your effort in this matter! I read your document about resilient network code, it was very interesting.

Putting that knowledge into mqtt it would mean having a very short keepalive time in order to have a "constant" exchange of data packets to recognize connection problems quickly. Apart from that it seems to be very close to your original approach of a resilient mqtt driver but also more complex codewise (Cancellabes, Events) making the size bigger. I'm curious about a minimal implementation for esp8266.

Interesting was also that your demo code was completely resilient sending messages as it does not send confirmation back that the data was correctly received, assuming that a broken socket will always throw an exception.

Is the esp32 doing it differently making that platform more resilient?
Kevin Köck
Micropython Smarthome Firmware (with Home-Assistant integration): https://github.com/kevinkk525/pysmartnode

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Writing robust network code

Post by pythoncoder » Sat Dec 01, 2018 7:32 am

kevinkk525 wrote:
Fri Nov 30, 2018 8:06 pm
...
Interesting was also that your demo code was completely resilient sending messages as it does not send confirmation back that the data was correctly received, assuming that a broken socket will always throw an exception...
My demo assumes that a broken receiving socket will either throw an exception or time out. In the majority of cases I observed the problem is a brief WiFi outage. This does not cause an exception and is caught by a timeout. A real application would send an acknowledge. This would not improve outage detection but would provide a guarantee that a data packet had been received. In the demo, as stated in the doc, if an outage occurs while data is being sent, the outage will be detected but the packet will be lost.
Is the esp32 doing it differently making that platform more resilient?
Far from being more resilient the ESP32 is incapable of being made resilient owing to this issue. In my view ESP32 is a toy until this is fixed.
also more complex codewise (Cancellabes, Events) making the size bigger
The asynchronous MQTT code is more complex (638 lines vs 151 including comments). My aim was to demonstrate the concepts and write readily understood code rather than to produce an optimised solution. It does seem that the demo uses more RAM but I haven't studied this in any detail as it wasn't my objective.

The take home point of all this is that the ESP8266 is capable of rock-solid reliability. Kudos to @Damien and @pfalcon for achieving this :D
Peter Hinch
Index to my micropython libraries.

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Writing robust network code

Post by pythoncoder » Sun Dec 02, 2018 8:59 am

kevinkk525 wrote:
Fri Nov 30, 2018 8:06 pm
...
Putting that knowledge into mqtt it would mean having a very short keepalive time in order to have a "constant" exchange of data packets to recognize connection problems quickly...
My resilient MQTT has a ping_interval value which controls the frequency with which MQTT ping packets are sent to the broker. Setting this is probably the most efficient approach.
Peter Hinch
Index to my micropython libraries.

kevinkk525
Posts: 969
Joined: Sat Feb 03, 2018 7:02 pm

Re: Writing robust network code

Post by kevinkk525 » Sun Dec 02, 2018 9:43 am

Yes, but the user has to define the right ping_interval. Many might choose a very long period.
Kevin Köck
Micropython Smarthome Firmware (with Home-Assistant integration): https://github.com/kevinkk525/pysmartnode

Post Reply