What would you recommend I do here ...

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
Post Reply
kesterlester
Posts: 23
Joined: Sat Nov 17, 2018 10:04 pm

What would you recommend I do here ...

Post by kesterlester » Thu Mar 18, 2021 11:54 pm

My question, is not a
What problem do I have here?
question. Instead it is more of a
What sorts of ways do you find are most helpful for debugging pyboard issues where you have limited connectivity --- and where problem might only be triggered extended periods without network usage.
Perhaps some of you can suggest:
  • ways I could get run-time debug info from the wlan() subsystem,
  • resources/documentation that explain how the pyboard wlan() subsystem is designed to cope with things like momentary/temporary loss of signal or expiry and renewal of DHCP leases and other network connection issues (i.e. what are its algorithms for managing its own connections)
  • tips about clever methods of connecting to and debugging a running process on a pyboard that will not require me to CRTL-C break into the asyncio event loop (see later for description of what I am talking about here)
  • anything else?
The background to the above question is as follows:


I have a D-series SF2W pyboard (i.e. with wifi) which uses asynchio to run a pair of tasks which, working together, control my central heating. One task runs a simple webserver which handles requests from users. The other is the background management task which decides when to start/stop events that have been scheduled for the future. Except when being programmed, the only link between the pyboard and the outside world is its wifi connection.

Most of the time the system works very well. But occasionally, for reasons I don't yet understand, the pyboard appears to stop responding via the web interface. This could be after a week of constant uptime in which it has been working smoothly, or it could be after being up for just a day.

Interestingly: if I discover it has it reached this "locked up" state, and go over and plug in a USB cable and open screen connection to the pyboard, it appears that the asynchio event loop is still running fine, and there is no obvious "problem". Indeed, from this position I can interrupt the event loop, can then re-issue the wlan().connect(...) which establishes the wifi connection, and then can then re-enter the asynchio event loop, and the pyboard is able to carry on running my program from where it was, happily talking to the outside world again.

This leaves me suspecting that the pyboard network stack is somehow ending up up in a state where it believes it is no longer wifi-connected, and needs to be reminded to connected, though I cannot tell yet whether the cause is inside or outside the pyboard ...

Now, I don't expect anyone on this forum to be able to tell me what my problem is. After all, the problem may be one of my own creation that emerges from all my own code which is too long to post here ....

However I feel I could benefit from advice as to how best to go about debugging an issue like this given that it presents me with a number of challenges:
  • The disconnect can take days to appear,
  • users may only notice that the problem hours after it actually struck,
  • while I could potentially poll lots, and/or log lots, such actions are the very sorts of things which could affect asynchio loop related issues,
  • the pyboard is part of a much bigger system (the house wifi) so the causes could be very complex (e.g. triggered only by the TV suddenly requesting a large wifi stream for netflix, etc)
  • I am not aware of any ways of getting the wlan() implementation to spit out debug information, which is something I'd (presumably) need it to do if the bug were in there,
  • I like the way that the current design (once it has booted) never needs to send out IP packets except in response to web requests from users. This makes it silent/clean from a net perspective, and I'd like to keep it that way. I therefore would prefer not to use any (easy) sticking plaster solutions (dummy keepalive packets, or once-per-minutenet reconnection attempts, etc) as these are just wasteful. I'd rather never fix the problem than kludge it.

Chris

kesterlester
Posts: 23
Joined: Sat Nov 17, 2018 10:04 pm

Re: What would you recommend I do here ...

Post by kesterlester » Fri Mar 19, 2021 12:29 am

There seem to be a lot of juicy looking WIFI disconnection codes in https://github.com/micropython/micropyt ... inc/wlan.h

How can the system to report these if/when it disconnects ?

E.g the code below is centred on line 75 and includes a

Code: Select all

SL_DISASSOCIATED_DUE_TO_INACTIVITY
which looks interesting:

Code: Select all

/* WLAN Disconnect Reason Codes */
#define  SL_DISCONNECT_RESERVED_0                                                                       (0)
#define  SL_DISCONNECT_UNSPECIFIED_REASON                                                               (1)
#define  SL_PREVIOUS_AUTHENTICATION_NO_LONGER_VALID                                                     (2)
#define  SL_DEAUTHENTICATED_BECAUSE_SENDING_STATION_IS_LEAVING                                          (3)
#define  SL_DISASSOCIATED_DUE_TO_INACTIVITY                                                             (4)
#define  SL_DISASSOCIATED_BECAUSE_AP_IS_UNABLE_TO_HANDLE_ALL_CURRENTLY_ASSOCIATED_STATIONS              (5)
#define  SL_CLASS_2_FRAME_RECEIVED_FROM_NONAUTHENTICATED_STATION                                        (6)
#define  SL_CLASS_3_FRAME_RECEIVED_FROM_NONASSOCIATED_STATION                                           (7)
#define  SL_DISASSOCIATED_BECAUSE_SENDING_STATION_IS_LEAVING_BSS                                        (8)

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: What would you recommend I do here ...

Post by pythoncoder » Fri Mar 19, 2021 1:48 pm

The problem is quite likely to be caused by brief WiFi outages as discussed here. There are at least two frameworks addressing this issue: resilient MQTT and micropython-iot.
Peter Hinch
Index to my micropython libraries.

Post Reply