Page 2 of 3

Re: optimizing uasyncio performance

Posted: Sat Dec 08, 2018 7:12 am
by rpr
I ran the script you posted. Getting times of around 1.6-1.7 ms using an ESP32 with 4 MB PSRAM running a build of the standard Micropython repo with the module uasyncio as frozen. There were other modules as well in that directory.

Code: Select all

  -- idle --
  -- task1 --
  -- task2 --
asyncio.sleep(0) = 1701 us
2801392
  -- idle --
  -- task1 --
2801280
  -- idle --
  -- task2 --
asyncio.sleep(0) = 1633 us
2800944
  -- idle --
2800896
  -- idle --
  -- task1 --
2800784
  -- idle --
  -- task2 --
asyncio.sleep(0) = 1688 us
2800448
  -- idle --
2800400
  -- idle --
  -- task1 --
2800288
  -- idle --
  -- task2 --
asyncio.sleep(0) = 1648 us
2799952
  -- idle --
2799904
  -- idle --
  -- task1 --
2799792
  -- idle --
  -- task2 --
asyncio.sleep(0) = 1669 us
2799456
  -- idle --
2799408
  -- idle --
  -- task1 --
2799296
  -- idle --
  -- task2 --
asyncio.sleep(0) = 1678 us
2798960
  -- idle --
2798912
  -- idle --
  -- task1 --
2798800
  -- idle --
  -- task2 --
asyncio.sleep(0) = 1619 us
2798464
  -- idle --
2798416
  -- idle --
  -- task1 --
2798304
  -- idle --
  -- task2 --
asyncio.sleep(0) = 1606 us
2797968
  -- idle --
  -- task1 --
2797856
  -- idle --
  -- task2 --
asyncio.sleep(0) = 1650 us

Re: optimizing uasyncio performance

Posted: Sat Dec 08, 2018 8:52 am
by kevinkk525
That is still surprisingly high but a lot better.
Does not help him though as he needs the loboris port because of the changes he made for I2S microphone.

Re: optimizing uasyncio performance

Posted: Sat Dec 08, 2018 1:32 pm
by mattyt
Or Mike's I2S modifications could be ported to mainline MicroPython. They don't look particularly tied to the Loboris fork...

Re: optimizing uasyncio performance

Posted: Sat Dec 08, 2018 4:02 pm
by fstengel
I just found this thread and decided to test the example on an M5Stack with 4MB psRam using a Lobo stock firmware (ESP32_LoBo_v3.2.24 - 2018-09-06; psram_all).

If I have the ftp and mDNS services running:
  • await asyncio.sleep_ms(0): around 3.7-3.8 ms
  • yield: around 2.7-2.8 ms

If I have nothing but wifi on:
  • await asyncio.sleep(0): around 2.3 ms
  • await asyncio.sleep_ms(0): around 3.0 ms
  • yield: around 2.2 ms
So, it seems unsurprisingly that services running concurently have quite an impact on micropython performance

Re: optimizing uasyncio performance

Posted: Sat Dec 08, 2018 4:51 pm
by pythoncoder
Interesting that sleep_ms() is consistently slower than sleep() despite the latter's use of floating point. I stand corrected on that ;)

I have doubts about uasyncio coping with i2s. It depends on how the i2s code is implemented. uasyncio can handle data streams on UARTs because the UART interface uses interrupts and buffering. So the buffer is filled below the radar of uasyncio, and so long as the latter is emptied quicker than the ISR fills it, all is well, even with multiple competing coros. The buffer size needs to be big enough to cope with the worst-case latency imposed by the other coros.

A few years ago someone tried to implement I2S on the Pyboard using DMA but never managed to get it working without glitches. How does the Loboris version work? Is the buffer size configurable?

Re: optimizing uasyncio performance

Posted: Sat Dec 08, 2018 5:02 pm
by kevinkk525
I thought those values for esp32 were high but testing it on my esp8266 running at 80MHz I got:
sleep(0): 2.1ms
sleep_ms(0): 2.1ms
yield: 1.5ms

Re: optimizing uasyncio performance

Posted: Sat Dec 08, 2018 8:35 pm
by fstengel
That does indeed seem high: my M5Stack's cpu clocks at 240MHz. So one could expect much shorter delays. I remember reading somewhere that the Lobo python fork uses "double" integer arithmetic. Could that be an issue here?

Re: optimizing uasyncio performance

Posted: Mon Dec 10, 2018 3:43 pm
by Mike Teachman
rpr wrote:
Sat Dec 08, 2018 7:12 am
I ran the script you posted. Getting times of around 1.6-1.7 ms using an ESP32 with 4 MB PSRAM running a build of the standard Micropython repo with the module uasyncio as frozen. There were other modules as well in that directory.
Thanks a lot for trying this testcase using the mainline repo. I appreciate your time to help me with this issue.
In addition to the significant reduction in the time to re-schedule Task2, your results show vastly better performance for heap use:
  • on the Lobo build, every time Task2 runs the heap goes down by approximately 50k bytes
  • on the mainline build, every time Task2 runs the heap goes down by approximately 350 bytes
Over 2 orders of magnitude difference for heap consumption... interesting

Re: optimizing uasyncio performance

Posted: Mon Dec 10, 2018 3:46 pm
by Mike Teachman
fstengel wrote:
Sat Dec 08, 2018 4:02 pm
I just found this thread and decided to test the example on an M5Stack with 4MB psRam using a Lobo stock firmware (ESP32_LoBo_v3.2.24 - 2018-09-06; psram_all).

If I have the ftp and mDNS services running:
  • await asyncio.sleep_ms(0): around 3.7-3.8 ms
  • yield: around 2.7-2.8 ms

If I have nothing but wifi on:
  • await asyncio.sleep(0): around 2.3 ms
  • await asyncio.sleep_ms(0): around 3.0 ms
  • yield: around 2.2 ms
So, it seems unsurprisingly that services running concurently have quite an impact on micropython performance
Really insightful. Thanks for trying this. It is a good example to show the performance gains that can be made by turning off unused features in the build.

Re: optimizing uasyncio performance

Posted: Mon Dec 10, 2018 3:48 pm
by kevinkk525
might have something to do with time.ticks_ms() being a 64-bit integer on loboris port using heap. In a post on his forum he says that in the next update, the behaviour will change and time.ticks_ms() will only return a 64-bit integer if the value needs it, otherwise it'll return a small int that won't need heap space.