Page 1 of 2

Dowloading Gigabytes reloaded

Posted: Sun Nov 20, 2016 10:07 am
by Roberthh
Hello folks,
let me start with a statement:

The ESP8266 with MicroPython is a damn good device, and Paul made an admirable work to get that all running! One should never forget that! Never forget that when you're fighting with a problem, which most of the time is anyhow yours.

During my last test it puzzled me, that the ESP's sometimes flagged an TIMEOUT error out of the full run, meaning, that the k-bytes count display did not stop beforehand. To sort out the Internet, I prepared a local server, which, when called, simply spits out a 25MBytes stream of data in 1 k chunks. I ran that server both on my local PC and an otherwise idling Cubietruck/debian board, which I use as NAS. Matching that, I simplified the download client a little bit. For testing, I used a WEMOS D1, Huzzah feather and for comparison a WiPy 1 and LoPy. The python scripts and a summary table are attached. Finding:
  • For ESP8266 devices, running as access point (AP_IF mode) is completely unreliable
  • ESP8266 devices running in Station mode still fail, sometimes into a crash, but most of the time into TIMEOUT. Since the server is local, that should not happen.
  • The timeout/crash behaviour can be influenced by the speed at which the server supplies that data. I added delays between each 1k chunk
  • Both WiPy and LoPy ran like a clockwork. No Timeout, no crash. WiPy 1 was the fastest bird on the road.
That indicates that there is still room for improvement regards to the reliability of the network stack. But at the rate the errors occur that may be very hard to track. Are there any means available that would help to trace such rare events?
test_reloaded.zip
(26.5 KiB) Downloaded 6359 times
P.S.: I have some tty logs of the crashes, like this one of the crash matching the first line of the table, but to me they do not tell much.

Code: Select all

 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset
load 0x40100000, len 32020, room 16 
tail 4
chksum 0xf6
load 0x3ffe8000, len 1096, room 4 
tail 4
chksum 0xb8
load 0x3ffe8450, len 3000, room 4 
tail 4
chksum 0xfd
csum 0xfd

Re: Dowloading Gigabytes reloaded

Posted: Sun Nov 20, 2016 11:42 pm
by dwight.hubbard
What is the behaviour when this occurs? Is the board hanging completely?

I know I'm seeing intermittent hangs off esp8266 boards that require a hard reset to recover.

Re: Dowloading Gigabytes reloaded

Posted: Mon Nov 21, 2016 6:26 am
by Roberthh
As far as I recall, in theses tests the board reboots and comes up with the normal boot sequence. But I had also occurences with other code when the board just went silent and did not react, not even to Ctrl-C. Either way is wrong.

Re: Dowloading Gigabytes reloaded

Posted: Mon Nov 21, 2016 2:18 pm
by jms
Dwight makes a good point. Do not use the term "crash". Say what happened where it deviates from what was expected.

Would also be good if you confirmed the power supply arrangement and any local decoupling (which as I have said you really ought to have).

How long (time and data) does it typically run for ? Does it have anything to do with data ? Is your unit able to crunch without network data transfer without rebooting itself ?

Re: Dowloading Gigabytes reloaded

Posted: Mon Nov 21, 2016 3:13 pm
by Roberthh
Hello @jms, the answer to most of your questions are in the text.
  • Crash means unexpected reset, told as wdt reset, when it should continue to communicate.
  • Power Supply = 5V/3A old linear bench supply, ~4 inch wire soldered to the board, at least 10uF||100nF decoupling at the board (see previous test). In the previous test run I also tried larger capacitors w/o any difference in behaviour.
  • Time = (total data amount)/(data rate), as can be determined from the table.
  • No, it is not related to the content of the data. It it somehow influenced by the speed, the server sends data, so it may be timing dependant phenomenon (I/O event in a specific context).
  • w/o any data transfer, the unit continues to run. I did not specially test for that here, but the longest strech I had was about 10 days, after which I did something else with the unit. The test was for the FTP server, which ran, and by and by I connected to it, just to see if it still responds. So in that test, it had TCP/IP transfer, but at avery low rate.
If you have any suspect for the reason of that behaviour, please tell. If you like to join testing, please do so. You do not need more than a esp8266 device and your PC.

Re: Dowloading Gigabytes reloaded

Posted: Tue Nov 22, 2016 7:30 am
by pythoncoder
@Roberthh Given that the outcome of some of your tests was affected by the method of powering the ESP8266, I would suggest that all testing should be done with an external PSU. The issue with USB power casts a doubt over all results acquired that way, in my view.

Re: Dowloading Gigabytes reloaded

Posted: Tue Nov 22, 2016 9:13 am
by Roberthh
@pythoncoder: My impression in the test was, that the power supply is not the only reason for unexpected behaviour. If the decoupling on the board is good, then it behaves similar with USB supply an direct supply. But using a direct supply definitely sorts out one source of error, and therefore I agree in preferring a direct power supply for testing. But the reason for the glitch is still totally unclear. Does someone know if the ESP8266 ouputs more details information when running in such a situation, e.g. on USB 1?
B.t.w.: The WiPy and LoPy were powered by USB through the expansion board.

Re: Dowloading Gigabytes reloaded

Posted: Tue Nov 22, 2016 12:13 pm
by jms
May I suggest the same test written in C to determine whether the problem is MicroPython (as implemented on the ESP).

Jon

Re: Dowloading Gigabytes reloaded

Posted: Wed Nov 23, 2016 7:35 pm
by ernitron
Hi Robert and all
actually I am not sure if what you see can be something related to what I tested here http://forum.micropython.org/viewtopic.php?f=16&t=2697.

WiFi instability reflects to the application, that is for sure. Some hangups are explained by intermittent WiFi signal. That's my conclusion.

My approach would be to check if connection is still there and try to recover (rebooting?) instead of letting the underlying stack handle it. But it would be cumbersome for the application. Maybe that should be addressed also at the API level (because at the SDK of Espressif there is nothing we can do).

What do you think?

Re: Dowloading Gigabytes reloaded

Posted: Wed Nov 23, 2016 8:41 pm
by Roberthh
Hello @ernitron. Wifi may be unstable and breaking connections can be handled in the application, that's agreed.
But I repeated the download test in a more robust set-up (no Internet, no remote server, no service provider which resets the ASDL line after 24h), since I had a suspicion that the connection failed even with a stable WiFi set-up. And that seem to be the result of the test, especially because the esp stumbles and the wipy and lopy run like an clockwork. Still, these fails can be treated as "just another WiFi problem" and dealt with like that.