Hangs on MQTT Connection

All ESP8266 boards running MicroPython.
Official boards are the Adafruit Huzzah and Feather boards.
Target audience: MicroPython users with an ESP8266 board.
Post Reply
Sebastian
Posts: 18
Joined: Sat Oct 24, 2015 8:09 pm

Hangs on MQTT Connection

Post by Sebastian » Tue Aug 16, 2016 5:05 pm

Hi,

I have some problems with my MQTT implementation. I'm using the robust.py implementation.
My current setup is:
WLAN router FritzBox 7490
Mosquitto running on a rasperry pi
1 Wemos mini board with a BM280 board reading the values and publishing these values to 3 topics every second
1 Wemos mini board with a MCP3008 chip reading temperature from a probe and publishing this values to a topic every 500ms
1 Wemos mini board which subscribed to these topics and showing the values on a oled display

The setup is running quite well for a variable time. Sometimes i can do this for 6000 loops next only 100 loops before all of the Wemos Board stop running at the same time. There is no automatic reboot. The last log message I get is the message before the publish function is called.
I watch all programs with the serial repl. Keyboard interrupt is not working. Also they don't respond to pings or a webrepl connect.
I can only reset the Wemos boards, then they reconnect to the WLAN and everything is running again. I don't have to reset the router or restart the raspberry pi.
If I try to simulate a wlan connection error by unplugging the wlan router the behavior is different. I see the log messages from the robust implementation for reconnecting. After the router has restarted the connection is established again and all programs run on.

Does someone have an idea or has seen a similar behavior? I think a watchdog would help here, but as far as I know there is no watchdog at the esp8266.

pfalcon
Posts: 1155
Joined: Fri Feb 28, 2014 2:05 pm

Re: Hangs on MQTT Connection

Post by pfalcon » Tue Aug 16, 2016 6:43 pm

I watch all programs with the serial repl. Keyboard interrupt is not working. Also they don't respond to pings or a webrepl connect.
That's why development version of micropython has OS-level debugging enabled by default - it may be quite useful in diagnosing stuff and to just be in loop what module is doing. Please see docs (quick ref) to see how enable it, and see if running with it will give more insight or at least output.
I think a watchdog would help here, but as far as I know there is no watchdog at the esp8266.
There's a watchdog, and it's managed by ESP8266's RTOS.
Awesome MicroPython list
Pycopy - A better MicroPython https://github.com/pfalcon/micropython
MicroPython standard library for all ports and forks - https://github.com/pfalcon/micropython-lib
More up to date docs - http://pycopy.readthedocs.io/

Sebastian
Posts: 18
Joined: Sat Oct 24, 2015 8:09 pm

Re: Hangs on MQTT Connection

Post by Sebastian » Tue Aug 16, 2016 10:01 pm

Hi,

first of all thanks for the quick reply.
I activated the osdebug messages. I forgot that i already had these messages when using a development version. The error is
Fatal exception 28Fatal exception 28Fatal exception 28Fatal exception 28
or
Fatal exception 28(LoadProhibitedCause):
Fatal exception 28
And the last one only logs
Fatal exception 28Fatal exception 28

Any ideas what to test?
Thanks.

Edit:
Second run shows different behavior on the wemos which subscribed:

Fatal exception 28(LoadProhibitedCause):
epc1=0x401021d6, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000

ets Jan 8 2013,rst cause:2, boot mode:(3,7)

load 0x40100000, len 30872, room 16
tail 8
chksum 0x5d
load 0x3ffe8000, len 1064, room 0
tail 8
chksum 0x0b
load 0x3ffe8430, len 3000, room 0
tail 8
chksum 0x76
csum 0x76
d▒▒|▒▒{oc▒▒
l▒
c▒l
#䄜▒▒▒c▒lsll
l▒▒|▒▒rrgc▒▒l
▒▒▒▒
c▒
c쌜▒▒▒b▒▒drl▒d▒▒|▒▒{oc▒▒l▒▒l
c▒
c쌜▒▒▒
c▒lcd▒▒s▒l▒l▒d`▒▒s▒d▒d▒d`▒▒{▒$
▒▒▒
ll`rl▒▒sl▒▒▒cd▒c{|▒cxc
▒▒o▒gg▒
ld▒▒l쎌
l▒▒ll▒▒▒d▒▒l▒n▒▒▒▒cd▒pğ▒c
l
▒cpslsd{▒g▒▒▒
▒csc▒▒c쎜▒▒cdcddb▒b▒▒n▒▒▒▒c{c▒▒c䏜▒▒cl
l
lcl▒
l▒▒|▒▒rrgc▒▒
l▒▒▒c▒
b䄜▒▒▒▒▒bddcl▒▒do▒l`▒▒lo▒psl
▒l▒▒|▒▒#5 ets_task(40100390, 3, 3fff6300, 4)
WebREPL daemon started on ws://192.168.4.1:8266
WebREPL daemon started on ws://0.0.0.0:8266
Started webrepl in normal mode

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Hangs on MQTT Connection

Post by pythoncoder » Wed Aug 17, 2016 8:34 am

My guess (and it is just that) is that the problems may occur when your WiFi temporarily fails. WiFi sometimes has brief outages which in normal circumstances go unnoticed. It would explain the varied uptimes you are experiencing. A way to diagnose it is to set client.DEBUG = True on each of your boards. They will then output information when they are trying to reconnect.

Another possibility: what qos are you using? With your high rate of messages I'd use the default of 0, at least initially.
Peter Hinch
Index to my micropython libraries.

Sebastian
Posts: 18
Joined: Sat Oct 24, 2015 8:09 pm

Re: Hangs on MQTT Connection

Post by Sebastian » Wed Aug 17, 2016 9:43 am

Hi pythoncoder,

thanks for the guess. I already enabled the DEBUG messages of the client and if i disconnect my wlan router i get the log messges that the client is trying to reconnect. But this messages did not appear when the system hangs.
The qos is set to 0 by default and I did not change anything. The strange thing is, that all of the clients stop working nearly at the same time so I thought that the router perhaps sends out some data which causes this error. But this is only a guess, too. I have also 3 wipys lying around. I will try to get them work as a MQTT publisher and see what they do. Perhaps i will also try to set up a different wlan router and check if the problem is related to the wlan router.
I also read your unofficial MQTT thread but I think you have never seen such problems during your tests....
I also thought about simplifying the example and not reading any sensor values or something to see if the failure is related to a sensor class. But if the failure appears on 3 different modules with 3 different libs and programs, I cannot belive that this is the problem. I also checked the free memory each loop and the RAM usage is stable.

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Hangs on MQTT Connection

Post by pythoncoder » Wed Aug 17, 2016 10:05 am

Sebastian wrote:...I think you have never seen such problems during your tests....
I'm afraid I haven't. I have run MQTT for substantial periods of time. One difference from your setup is that the rate at which I'm publishing is much lower - about once per 5 seconds. I only encountered one reliability issue which was a MicroPython memory leak (Issue #2280) but I don't believe this is relevant to your application.

It might just be instructive to check it does respond properly in the event of WiFi outages - I tested by disabling the access point but another approach is to take one or more units out of range of the AP. Another trick I've used in testing radio protocols is to pop the module in the microwave! A microwave oven (turned off ;)) acts as an effective Faraday cage, shielding the unit from the transmitter. Close the door gradually and you get a progressive loss of signal.

But to be honest I don't think this is the source of your problem. I'm stumped. :(
Peter Hinch
Index to my micropython libraries.

pfalcon
Posts: 1155
Joined: Fri Feb 28, 2014 2:05 pm

Re: Hangs on MQTT Connection

Post by pfalcon » Fri Aug 19, 2016 4:56 pm

Sebastian wrote:
Edit:
Second run shows different behavior on the wemos which subscribed:

Fatal exception 28(LoadProhibitedCause):
epc1=0x401021d6, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000
Sorry for delay with reply. This does show a real bug, and for bugs to be not forgotten, they really should be reported at https://github.com/micropython/micropython/issues . To proceed further I'd need from you:
  • If you build micropython from source - firmware.elf, firmware.map (and the up-to-date exception like above, produce on that exact firmware)
  • If you use prebuilt version - that's actually a bit more complicated, but can try with a link to exact binary and exception against that exact version.
Thanks.
Awesome MicroPython list
Pycopy - A better MicroPython https://github.com/pfalcon/micropython
MicroPython standard library for all ports and forks - https://github.com/pfalcon/micropython-lib
More up to date docs - http://pycopy.readthedocs.io/

Sebastian
Posts: 18
Joined: Sat Oct 24, 2015 8:09 pm

Re: Hangs on MQTT Connection

Post by Sebastian » Sat Aug 20, 2016 6:34 am

Thanks for the reply. All of the boards are running the last stable release esp8266-20160809-v1.8.3.bin from the download section.
I used some self compiled version between 1.8.2 and 1.8.3 before and had simmilar failures but I would say that the boards rebooted more often with the failure above and don't stop working without any resonse. Should i open a github issue with these information or is it more helpful if I compile a version and try to reproduce the failure with that version?

I also did some tests with wipys. The behaviour is different. I set up two devices. One as a publisher and one as a subscriber. The subscriber often stops working with an oserror 8 or something. Not sure why but this is a different topic. The publisher ran perfect over the whole night without having any problems while the wemos boards stopped working after some time.

pfalcon
Posts: 1155
Joined: Fri Feb 28, 2014 2:05 pm

Re: Hangs on MQTT Connection

Post by pfalcon » Sat Aug 20, 2016 11:35 am

Yes, definitely please open a github ticket - I'm busy these weeks, and it may take some time to get to investigate that issue, and a ticket will may sure it's not lost.

As for building yourself - well, from project's perspective, the more people and more frequently build uPy from source, the better ;-). It's up to your time/availability of course. Note that ideal feedback from you would be a reduced testcase which quickly shows the issue (like, I run it and get that exception). I understand that in this case it looks like a non-deterministic problem.
Awesome MicroPython list
Pycopy - A better MicroPython https://github.com/pfalcon/micropython
MicroPython standard library for all ports and forks - https://github.com/pfalcon/micropython-lib
More up to date docs - http://pycopy.readthedocs.io/

Sebastian
Posts: 18
Joined: Sat Oct 24, 2015 8:09 pm

Re: Hangs on MQTT Connection

Post by Sebastian » Mon Aug 22, 2016 6:02 pm

Hi pfalcon,
You said that
pfalcon wrote:
There's a watchdog, and it's managed by ESP8266's RTOS.
But does the commit https://github.com/micropython/micropyt ... 32ed32d500 mean that the watchdog was disabled on all versions?

Post Reply