question. Instead it is more of aWhat problem do I have here?
Perhaps some of you can suggest:What sorts of ways do you find are most helpful for debugging pyboard issues where you have limited connectivity --- and where problem might only be triggered extended periods without network usage.
- ways I could get run-time debug info from the wlan() subsystem,
- resources/documentation that explain how the pyboard wlan() subsystem is designed to cope with things like momentary/temporary loss of signal or expiry and renewal of DHCP leases and other network connection issues (i.e. what are its algorithms for managing its own connections)
- tips about clever methods of connecting to and debugging a running process on a pyboard that will not require me to CRTL-C break into the asyncio event loop (see later for description of what I am talking about here)
- anything else?
I have a D-series SF2W pyboard (i.e. with wifi) which uses asynchio to run a pair of tasks which, working together, control my central heating. One task runs a simple webserver which handles requests from users. The other is the background management task which decides when to start/stop events that have been scheduled for the future. Except when being programmed, the only link between the pyboard and the outside world is its wifi connection.
Most of the time the system works very well. But occasionally, for reasons I don't yet understand, the pyboard appears to stop responding via the web interface. This could be after a week of constant uptime in which it has been working smoothly, or it could be after being up for just a day.
Interestingly: if I discover it has it reached this "locked up" state, and go over and plug in a USB cable and open screen connection to the pyboard, it appears that the asynchio event loop is still running fine, and there is no obvious "problem". Indeed, from this position I can interrupt the event loop, can then re-issue the wlan().connect(...) which establishes the wifi connection, and then can then re-enter the asynchio event loop, and the pyboard is able to carry on running my program from where it was, happily talking to the outside world again.
This leaves me suspecting that the pyboard network stack is somehow ending up up in a state where it believes it is no longer wifi-connected, and needs to be reminded to connected, though I cannot tell yet whether the cause is inside or outside the pyboard ...
Now, I don't expect anyone on this forum to be able to tell me what my problem is. After all, the problem may be one of my own creation that emerges from all my own code which is too long to post here ....
However I feel I could benefit from advice as to how best to go about debugging an issue like this given that it presents me with a number of challenges:
- The disconnect can take days to appear,
- users may only notice that the problem hours after it actually struck,
- while I could potentially poll lots, and/or log lots, such actions are the very sorts of things which could affect asynchio loop related issues,
- the pyboard is part of a much bigger system (the house wifi) so the causes could be very complex (e.g. triggered only by the TV suddenly requesting a large wifi stream for netflix, etc)
- I am not aware of any ways of getting the wlan() implementation to spit out debug information, which is something I'd (presumably) need it to do if the bug were in there,
- I like the way that the current design (once it has booted) never needs to send out IP packets except in response to web requests from users. This makes it silent/clean from a net perspective, and I'd like to keep it that way. I therefore would prefer not to use any (easy) sticking plaster solutions (dummy keepalive packets, or once-per-minutenet reconnection attempts, etc) as these are just wasteful. I'd rather never fix the problem than kludge it.
Chris