crime and punishment: uasyncio, non-blocking spi, and network errors

oclyke · Post by **oclyke** » Wed Nov 10, 2021 3:29 am

I'm noticing a perplexing issue and I am hoping someone who is pretty familiar with the ESP32 could offer an insight that gets me going in the right direction.

Using uasyncio I run an addressable LED output at ~30Hz as well as some basic network operations (fetch http data) which run when a connection is detected. I've run out of CPU time for my animations and so I had hoped to make my SPI bus run in a non-blocking manner. The ESP32 port already utilizes the underlying DMA capabilities of the IDF drivers, but it busy blocks until the transaction(s) is(are) complete. For a quick test I just commented out the places where that occurs. They look like this:

Code: Select all

                MP_THREAD_GIL_EXIT();
                spi_device_get_trans_result(self->spi, &result, portMAX_DELAY);
                MP_THREAD_GIL_ENTER();

I figured this *could* cause issues in some scenarios such as:
* if too many transactions were queued before the DMA system could process them (basically overflow some queue somewhere)
* drivers which rely on the blocking nature of spi transactions could fail

What I did NOT expect was that my network functions would begin to throw exceptions. These seem to happen in a few places but mostly during address resolution (which is a blocking section of code in network functions that otherwise rely on uasyncio streams and io events to operate non-blocking). The exceptions are sometimes ECONNABORTED, OSError -202, or other... (in that order of prevalence).

I can reliably repeat these issues whenever my application queues a non-blocking spi transaction of approx 900 bytes in a rate-limited output loop at 30Hz. If the spi transaction is only 5 bytes I don't seem to see the exceptions, but I haven't probed the exact transaction size where the issues begin.

Interestingly enough when the SPI transactions are _blocking_ (waiting for results as in standard esp32 port) there are no issues -- despite that in this configuration there is less free CPU time to handle network things in the other coroutine. Strange?

Unfortunately I don't (atm) have simple steps to reproduce for users at home. I might be able to come up with some if anyone is interested.

I've been stumped all day - does anyone happen to have an idea why SPI DMA transactions could cause network exceptions like this? Is there an implicit dependency between ESP32 spi masters (particularly VSPI in my case) and the lwip or rf stack???

Thanks for any help!