optimizing uasyncio performance

All ESP32 boards running MicroPython.
Target audience: MicroPython users with an ESP32 board.
User avatar
Mike Teachman
Posts: 44
Joined: Mon Jun 13, 2016 3:19 pm
Location: Victoria, BC, Canada

optimizing uasyncio performance

Post by Mike Teachman » Tue Dec 04, 2018 7:06 pm

I'm hoping a uasyncio expert can comment on performance challenges I'm facing. I'm not a uasyncio expert and suspect that I am using the library incorrectly.

Background
I am using the usayncio library in a uPy project to manage operation of sensors, a display, and an I2S microphone. It was all coming together until I starting seeing performance issues with the I2S microphone - gaps were appearing in the audio sample stream.

The I2S microphone code runs in a coro loop that looks like this:
a) read 2048 bytes of audio samples from the microphone using I2S
b) call await asyncio.sleep(0) to yield
c) prune out unused bytes, 2048->512 bytes
d) call await asyncio.sleep(0) to yield
e) write 512 bytes to WAV file on a SD Card
f) call await asyncio.sleep(0) to yield
g) loop back to a)

The ambition is a gap-less audio stream recorded to a WAV file. It works, except when I implement with uasyncio ...
Here is the file being discussed:
https://github.com/MikeTeachman/micropy ... etsense.py
in this repo:
https://github.com/MikeTeachman/micropy ... reet-sense


Performance challenges with uasyncio

1. calls to await asyncio.sleep(0) in this application consistency take > 6ms to run, sometimes 10ms -- this adds delays into the microphone read-write loop, putting constraints on the sample rate that can be used.
https://github.com/MikeTeachman/micropy ... se.py#L283
2. frequent garbage collection

Environment:
- I am using the ESP32 LoBo build, with customization to support I2S
- board is Lolin D32 Pro (4MB PSRAM)

Here is a testcase illustrating both issues. You will see the call to await asyncio.sleep(0) taking > 3ms to return and rapid "consumption" of the uPy heap.

Testcase

Code: Select all

import utime
import uasyncio as asyncio
import asyn
import gc

async def idle():
    while True:
        print("  -- idle --")
        await asyncio.sleep(1)
        utime.sleep_us(1) # workaround for watchdog trigger issue in LoBo port
        print(gc.mem_free())

async def task1(b):
    while True:
        print("  -- task1 --")
        await b 
        await asyncio.sleep(2)        
        
async def task2(b):
    t0 = 0
    t1 = 0
    while True:
        print("  -- task2 --")
        t0 = utime.ticks_us()
        await asyncio.sleep(0) 
        t1 = utime.ticks_us()
        print("asyncio.sleep(0) = {} us".format(t1-t0))
        await b        
        await asyncio.sleep(3)
        
loop = asyncio.get_event_loop()
barrier = asyn.Barrier(2) 
loop.create_task(idle()) 
loop.create_task(task1(barrier)) 
loop.create_task(task2(barrier))  
loop.run_forever()
testcase output:

Code: Select all

  -- idle --
  -- task1 --
  -- task2 --
asyncio.sleep(0) = 2464 us
472800
  -- idle --
472400
  -- idle --
  -- task1 --
  -- task2 --
asyncio.sleep(0) = 3149 us
417184
  -- idle --
416784
  -- idle --
  -- task1 --
414976
  -- idle --
  -- task2 --
asyncio.sleep(0) = 3083 us
361264
  -- idle --
360864
  -- idle --
  -- task1 --
357808
  -- idle --
  -- task2 --
asyncio.sleep(0) = 3087 us
305248
  -- idle --
304848
  -- idle --
  -- task1 --
300448
  -- idle --
  -- task2 --
asyncio.sleep(0) = 3052 us
249040
  -- idle --
248640
  -- idle --
  -- task1 --
242992
  -- idle --
  -- task2 --
asyncio.sleep(0) = 3112 us
193312
  -- idle --
192912
  -- idle --
  -- task1 --
185824
  -- idle --
  -- task2 --
asyncio.sleep(0) = 3099 us
137104
  -- idle --
136704
  -- idle --
  -- task1 --
128368
  -- idle --
  -- task2 --
asyncio.sleep(0) = 3223 us
81088
  -- idle --
80688
  -- idle --
  -- task1 --
71008
  -- idle --
  -- task2 --
asyncio.sleep(0) = 3168 us
25168
  -- idle --
24768
  -- idle --
  -- task1 --
13744
  -- idle --
  -- task2 --
asyncio.sleep(0) = 3056 us
455360
  -- idle --
454960
  -- idle --
  -- task1 --
Does anyone have suggestions to improve the performance?

thanks!

kevinkk525
Posts: 216
Joined: Sat Feb 03, 2018 7:02 pm

Re: optimizing uasyncio performance

Post by kevinkk525 » Tue Dec 04, 2018 8:31 pm

Do you have other coroutines running?

You could try using a uasyncio version with priority queues to ensure that the mic coro is called faster: https://github.com/peterhinch/micropyth ... er/fast_io

User avatar
Mike Teachman
Posts: 44
Joined: Mon Jun 13, 2016 3:19 pm
Location: Victoria, BC, Canada

Re: optimizing uasyncio performance

Post by Mike Teachman » Wed Dec 05, 2018 6:11 am

kevinkk525 wrote:
Tue Dec 04, 2018 8:31 pm
Do you have other coroutines running?
I don't think so ... When await asyncio.sleep(0) is called all other coroutines should either be blocked waiting on a Barrier, or sleeping. I expected the scheduler to quickly return control to the coroutine that called await asyncio.sleep(0). The delay of 6+ms is unexpected. That seems excessive when no other coroutines are waiting to run. (which makes me think I'm doing something wrong ...)

I tried replacing the mainline usyncio with the fast_io version, but I still see the same delay after calling asyncio.sleep(0).

kevinkk525
Posts: 216
Joined: Sat Feb 03, 2018 7:02 pm

Re: optimizing uasyncio performance

Post by kevinkk525 » Wed Dec 05, 2018 9:21 am

6ms does seem a bit long if nothing else is running. maybe @pythoncoder knows more about it.

Do you have any feature of loboris port running like ftp? They spawn a new thread making the main thread slower. Maybe this is the problem?

User avatar
pythoncoder
Posts: 3121
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: optimizing uasyncio performance

Post by pythoncoder » Wed Dec 05, 2018 5:19 pm

For a nominal zero delay it is quicker to issue asyncio.sleep_ms(0) and quicker still to issue yield. However whenever you issue one of these, every other coro will get scheduled before it returns. Note that the Barrier runs a (minimal) coro while a task is waiting on it.

These delays soon add up. That said, a quick run on a Pyboard shows values around 450μs so a lot is down to the nature of the ESP32.
Peter Hinch

User avatar
Mike Teachman
Posts: 44
Joined: Mon Jun 13, 2016 3:19 pm
Location: Victoria, BC, Canada

Re: optimizing uasyncio performance

Post by Mike Teachman » Fri Dec 07, 2018 3:48 pm

kevinkk525 wrote:
Wed Dec 05, 2018 9:21 am
Do you have any feature of loboris port running like ftp? They spawn a new thread making the main thread slower. Maybe this is the problem?
Thanks for this idea. I tried turning off Telnet, FTP, and MQTT. This gave about a 5% performance improvement. Not very significant, but I'll take it!

User avatar
Mike Teachman
Posts: 44
Joined: Mon Jun 13, 2016 3:19 pm
Location: Victoria, BC, Canada

Re: optimizing uasyncio performance

Post by Mike Teachman » Fri Dec 07, 2018 4:18 pm

pythoncoder wrote:
Wed Dec 05, 2018 5:19 pm
For a nominal zero delay it is quicker to issue asyncio.sleep_ms(0) and quicker still to issue yield.
Thanks for these insights! .. here are the test results:

asyncio.sleep(0) = 6.6 ms delay
asyncio.sleep_ms(0) = 7.0 ms delay
yield = 6.5 ms delay

Yield came out on top, and the _ms() version is slowest to return. Note: I am using the uasyncio release in micropython-lib.

Unfortunately, still not good enough for the microphone handling part of this project. Next steps:
1. "forensics" -- enable debug logging in uasyncio and uasyncio.core to understand what is creating this delay. Perhaps some simple tuning in core.py can bring significant improvements?
2. possibly create a new _thread dedicated to the I2S microphone handler. And, keep using uasyncio in the uPy main thread to manage all the other sensors, display, MQTT, buttons, etc (which are not time critical).

kevinkk525
Posts: 216
Joined: Sat Feb 03, 2018 7:02 pm

Re: optimizing uasyncio performance

Post by kevinkk525 » Fri Dec 07, 2018 4:53 pm

Option 2 is what I'd try. It sounds like the best option for this scenario.

rpr
Posts: 15
Joined: Sat Oct 27, 2018 5:17 pm

Re: optimizing uasyncio performance

Post by rpr » Fri Dec 07, 2018 5:48 pm

I'm a total newbie just learning. Is it possible somehow to dedicate one of the two cores for a given task.

User avatar
Mike Teachman
Posts: 44
Joined: Mon Jun 13, 2016 3:19 pm
Location: Victoria, BC, Canada

Re: optimizing uasyncio performance

Post by Mike Teachman » Fri Dec 07, 2018 7:27 pm

Today, it's not possible to dedicate a uPy operation to a specific ESP32 core.

The closest I've seen to this ambition is an experimental dual-core option on the LoBo port . As I understand it, with this option, the uPy VM runs on one core, and the rest of the ESP32 core rtos tasks (wifi, etc) run on the other core. In practice, I haven't seen much improvement in the dual-core mode compared to running everything on one core.

I read somewhere that there is an effort underway (by Loboris maybe?) to enable splitting uPy operation between the two cores. That would help with my application I think.

Post Reply