An error which occur in time

All ESP32 boards running MicroPython.
Target audience: MicroPython users with an ESP32 board.
Post Reply
newb
Posts: 43
Joined: Wed Jan 10, 2018 8:19 pm
Location: Bulgaria

An error which occur in time

Post by newb » Fri Nov 30, 2018 7:38 am

Hi, I have an esp32 with standard uPython port, which:

1. connects to wifi
2. gets proper time from ntp
3. pulls some data from two APIs with urequests library
4. shows the data on screen
5. goes to deep sleep for 15 min

Sometime this works for 3 days, sometimes the operation freezes and no update on screen occurs after 2 hours of wake-sleep cycle. Currently I monitor a variable which I put in rtc.memory() and which I increment on every wakeup and show it on screen along with the date and time of the wake up. This way I can check if the module is working or not.

However, is there a way to log what's exactly is going on so I can debug this freezing?

Thanks.

danielm
Posts: 167
Joined: Mon Oct 05, 2015 12:24 pm

Re: An error which occur in time

Post by danielm » Fri Nov 30, 2018 11:51 am

I am experiencing similar issues with Pycom port. Check this post and next 4 posts:
https://forum.pycom.io/topic/3985/new-l ... e-40343/41
Currently some socket operations will not timeout also in case timeout is set. Maybe this is the case for micropython.org port as well.

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

WiFi behaviour as seen by MicroPython

Post by pythoncoder » Sat Dec 01, 2018 5:47 pm

My view after spending a lot of time on this is as follows:
  • WiFi can suffer brief outages. Devices with an underlying OS do a good job of hiding these events so the WiFi can be perceived as rock solid.
  • Bare metal devices don't shield you from this: outages must be handled in code.
  • Conditions can be local: I have a long running test with 3 ESP8266 devices reporting outages back to a server. Each one experiences different numbers.
  • If code is waiting on read operation on a socket the case where the sender has nothing to send and the case where an outage occurs are indistinguishable. No exception is raised. If timeouts are properly implemented, one will occur.
  • For reliability both ends of the link need to detect the outage, close any open sockets and initiate reconnection. If only one end does this the other will fail.
  • If connectivity has been lost but not yet detected, an application may continue to feed data into a non-functional socket. This will probably lead to a crash (definitely on ESP8266).
  • To prevent this, an application which needs to write data must first verify connectivity. As stated above this can only be done by a read timeout. This implies the other end of the link sending data more frequently than the timeout period.
I have covered this in detail here.

If these issues are addressed it is possible to write rock-solid networking applications on ESP8266. I don't believe this is possible on ESP32 because of this issue which I believe is still outstanding.

Unless I'm missing somethin this has potential implications for many networking applications running on bare metal MicroPython targets.

[Footnote]
Brief outages seem independent of hardware. I have a project (an astronomical wall clock) I built some 12 years ago; this has an LED showing WiFi connectivity. It is ~5 metres from the access point and both can see each other via an open doorway. Through three iterations of AP from different manufacturers the LED's behaviour has been consistent: occasionally itl briefly goes out. In the long running ESP8266 test above, I logged over 100 outages over 3 devices in a week.
Peter Hinch
Index to my micropython libraries.

newb
Posts: 43
Joined: Wed Jan 10, 2018 8:19 pm
Location: Bulgaria

Re: An error which occur in time

Post by newb » Mon Dec 03, 2018 4:46 pm

Thank you, Daniel and Peter! I'll go straight for the wifi/sockets issue.

Peter, this is the third obstacle which I'm facing in uPython and which seems to be solvable with your uasyncio library.
Thanks you for your great thorough documentation!

newb
Posts: 43
Joined: Wed Jan 10, 2018 8:19 pm
Location: Bulgaria

Re: An error which occur in time

Post by newb » Wed Dec 05, 2018 2:35 pm

So it seems (not 100% sure), the hanging is caused by timeout of the post request using the requests lib.

I tried to rewrite my code with uasyncio, but it might be a bit overkill as I have no simultaneous routines running except for the urequests' post request hanging.

I tried to implement the example for Coro Timeout https://github.com/peterhinch/micropyth ... h-timeouts but urequests throws an exception before the timeout is detected by the Coro.

Code: Select all

OSError: [Errno 110] ETIMEDOUT
May be I have implemented the example in a wrong way but I ended up just catching the OSError and avoiding the script hanging.

If the problem reoccurs, it will be likely caused by wifi outage as Peter suggested.

Post Reply