File write speed

All ESP32 boards running MicroPython.
Target audience: MicroPython users with an ESP32 board.
User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: File write speed

Post by pythoncoder » Tue May 15, 2018 5:13 am

@goatchurch I think you may have misunderstood my above comment. I was suggesting an asynchronous solution along these lines:

Code: Select all

import uasyncio as asyncio

async def sender():
    with open('test.txt', 'w') as f:
        swriter = asyncio.StreamWriter(f, {})
        while True:
            await swriter.awrite('Hello file\n')
            await asyncio.sleep_ms(10)

loop = asyncio.get_event_loop()
loop.create_task(sender())
loop.run_forever()
(The 10ms sleep was merely to keep the file size within bounds on this fast Linux box.)
Peter Hinch
Index to my micropython libraries.

jickster
Posts: 629
Joined: Thu Sep 07, 2017 8:57 pm

Re: File write speed

Post by jickster » Tue May 15, 2018 9:02 pm

pythoncoder wrote:
Tue May 15, 2018 5:13 am
@goatchurch I think you may have misunderstood my above comment. I was suggesting an asynchronous solution along these lines:

Code: Select all

import uasyncio as asyncio

async def sender():
    with open('test.txt', 'w') as f:
        swriter = asyncio.StreamWriter(f, {})
        while True:
            await swriter.awrite('Hello file\n')
            await asyncio.sleep_ms(10)

loop = asyncio.get_event_loop()
loop.create_task(sender())
loop.run_forever()
(The 10ms sleep was merely to keep the file size within bounds on this fast Linux box.)
While this may work in general, his problem is that the ESP OS freezes the entire system when a flash write is performed to the same flash memory where the code is stored.

To overcome this, he needs to store his code in a different location than SPI flash.

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: File write speed

Post by pythoncoder » Wed May 16, 2018 5:25 am

OK, I hadn't appreciated that.
Peter Hinch
Index to my micropython libraries.

monquarter
Posts: 15
Joined: Sat Jan 11, 2020 2:31 pm

Re: File write speed

Post by monquarter » Sun Dec 13, 2020 12:25 am

I have a similar issue where I am trying to log sensor acquired data quickly, and tried the solution proposed above:

Code: Select all

import uasyncio as asyncio

async def sender():
    with open('test.txt', 'w') as f:
        swriter = asyncio.StreamWriter(f, {})
        while True:
            await swriter.awrite('Hello file\n')
            await asyncio.sleep_ms(10)

loop = asyncio.get_event_loop()
loop.create_task(sender())
loop.run_forever()
However, I get an error (testing on a Pyboard D-series with Micropython 1.13

Code: Select all

<Task>
Traceback (most recent call last):
  File "<stdin>", line 12, in <module>
  File "uasyncio/core.py", line 1, in run_forever
  File "uasyncio/core.py", line 1, in run_until_complete
  File "uasyncio/core.py", line 1, in wait_io_event
OSError: [Errno 22] EINVAL
>>>
I'm not sure if I am missing something, or if uasyncio supports writing to files. I wish it did as it would be very helpful.

jornamon
Posts: 3
Joined: Mon Jul 04, 2022 4:30 pm

Re: File write speed

Post by jornamon » Mon Jul 04, 2022 5:41 pm

Sorry to resurrect such an old post, but it's 2022 and I am still experiencing similar problemas with high rate data logging to file, and this was one of the most useful threads I found.

I was wondering if the OP or someone else found a solution for this kind of problem, which I believe may not be so uncommon. When you are acquiring data from sensor at high speeds, such big file write delays mess up with the sample frequency.

In my case, I have been experimenting both with a Pi Pico and with a ESP32-S3 (UM FeatherS3), both with MP 1.19.1.

I am still investigating but with regular programming approach I get those delay spikes when the buffer is written to the file (by the system or manually with a flush).

Now I am testing with a threaded version using _thread, trying to do the file write with the other core. So far (it is early yet) I have had mixed results. Besides dealing with the instability of the _thread module, I have noticed an improvement with the Pi Pico up to a point where the file write thread is not able to cope with the data generating thread and all the thing crashes, but it the total data throughput is not too much I have stable sampling rates. With the ESP32 I have been having terrible results, with huge delays, but honestly I don't know if it may be that the Flash chip or the interface with the mcu is slower than the Pico's, or maybe I have to make some adjustments.

If there's interest I could share some results after a bit more testing.

While I am not an expert, for what I have been researching, the fact that the data file resides in the same memory as the python program may be a very important factor, which freezes the execution when you write the file, making the solutions based on asyncio useless in this case, which may be preferable to _thread otherwise.

These are the possibilities I want to investigate to try to solve this problem. Some are not very desirable, but may be a workaround. I would appreciate if anyone has some thoughts to share on any of them:
  • The possibility to force the python code (in case it is small enough) to be loaded into RAM at start, so the flash chip can be exclusively used for data logging. Does anybody know if this is possible?
  • Fine tune the threaded version to see if a true improvement can be achieved.
  • Find a true asynchronous (non-blocking) file write method that could really work with something based on asyncio.
  • If the file system constitute a significant overhead (which I don't know at this point), try to write raw data to Flash instead of a file and process it later.
  • If data acquisition is not continuous, use a board with enough PSRAM to buffer all data and only write to file after the sampling phase.
  • If data acquisition is continuous but you still can have a lot of PSRAM, try to use big buffers and big writes to see if there is an improvement, but I think it would lead to very spaced but huge delays.

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: File write speed

Post by jimmo » Tue Jul 05, 2022 5:55 am

jornamon wrote:
Mon Jul 04, 2022 5:41 pm
Besides dealing with the instability of the _thread module
I've been working on Pico & _thread concurrency reliability over the past few days and have sent a few PRs that should hopefully be merged soon. I was under the impression that a large class of issues were fixed in March thanks to a fix to the garbage collector (but only released recently with 1.19.1), but if you have other issues please let me know.
jornamon wrote:
Mon Jul 04, 2022 5:41 pm
I have noticed an improvement with the Pi Pico up to a point where the file write thread is not able to cope with the data generating thread and all the thing crashes, but it the total data throughput is not too much I have stable sampling rates.
On the Pico, I have found that using flash (or the filesystem) while using the second core is basically a guaranteed crash -- see https://github.com/micropython/micropython/pull/8846 for the fix.
jornamon wrote:
Mon Jul 04, 2022 5:41 pm
The possibility to force the python code (in case it is small enough) to be loaded into RAM at start, so the flash chip can be exclusively used for data logging. Does anybody know if this is possible?
In general though, because the Pico uses XIP, any flash operations are going to suspend execution of both cores, unless we can get the second core executing from RAM. This is basically impractical for executing Python code unfortunately as you'd need to put the entire VM into RAM. (Same story with the ESP32).
jornamon wrote:
Mon Jul 04, 2022 5:41 pm
If the file system constitute a significant overhead (which I don't know at this point), try to write raw data to Flash instead of a file and process it later.
The use of LittleFS is supposed to make it so that the filesystem overhead should be almost zero compared to just writing to the flash directly.

I think any system that wants to do delay-free logging to flash needs to be talking to a different flash chip than the one that is currently being used for XIP, especially when we get a SPI DMA driver for rp2 (although making the filesystem work with asyncio is going to be a challenge). (Unless of course you write the whole thing in C and do _very_ careful synchronisation and put significant amounts of code into RAM).

(In more detail on the filesystem thing, our problem is that the (third-party) LittleFS driver itself is written in C and is not asynchronous or callback-based, so it blocks while the flash write is in progress... this would be a big undertaking to fix and require a redesign LittleFS driver, and MicroPython's VFS layer).

It's worth reading the linked post from https://github.com/micropython/micropyt ... 1172290372 (i.e. https://kentindell.github.io/2021/03/05 ... inversion/ )

jornamon
Posts: 3
Joined: Mon Jul 04, 2022 4:30 pm

Re: File write speed

Post by jornamon » Thu Jul 14, 2022 7:05 pm

Thank you so much for your reply jimmo! It was really useful and I learnt a lot. I hope many people looking for low-lag high-frecuency data logging in Micropython comes across it at some point.

I'm sorry for my delayed reply, but I couldn't find the time to finish my testing, and I wanted to include the results here, because maybe they can be useful to others.

In fact, the results pretty much confirm what jimmo and others have said in this thread, that having the log file in the same flash chip that is used for XIP is asking for trouble, especially as the the sampling period gets small. In fact, the Pi Pico crashed using the XIP flash por T=~10ms or faster. The ESP32 didn't crash, but the benefits of multithreading were partially negated. I believe the _thread implementation of those two ports is quite different.

What I have found interesting is that multithreading works really well when storing the file in a different place, almost completely getting rid of long delays.

As far as I can see right now, the preferred method for high frequency file log should be using a different chip/card for the file and use different threads for file write and sampling tasks. One caveat is the the queue grows indefinitely if the sampling throughput is higher that what you can write to the file, but at least one can decide how to deal with the growing queue depending on the application.

Regarding the tests, they were made simulating a sensor(s) sampling loop with a delay (no real sensor were attached), for each loop/sample, 50 integers of data were generated, and a total of 500 samples were taken on each to the tests. The sampling periods (T) were 50, 25 and 10 ms.

Two board were used, a Pi Pico and a FeatherS3 (ESP32-S3 with 8MB os PSRAM).

Three file storage systems were considered: the flash chip where the program is stored (flash(xip)), a XTSD chip from Adafruit (xtsd) https://www.adafruit.com/product/4899 which behaves like an SD card but without a removable card, and which appears to be faster that a regular SD card reader, and finally a regular SD card (sdcard) (Sandisk class 10).

All of those storage devices were tested with a single thread (st) and a multithreaded (mt) version, which used one thread that samples and writes to a queue (with a lock) and another thread that reads the queue and writes to the file.

For each test run I calculated the Max, Min, Avg and the Std. deviation of the sampling times. Another metric that I called overhead, (Tavg-T)/T, was added to facilitate comparison among all test runs with a single meaningful number.

I think overhead combined with the Std. deviation offer a good picture of the performance without too much data, so I highlighted those two in one set of charts, both in a method-to-method and a board-to-board comparison way. I only did this one for the T=10ms case, which is the one that pushes a little bit the limits.

The other charts are the whole statistical data for all the test runs.

Overhead and std. dev for T=10ms
Image

T=10ms
Image

T=25ms
Image

T=50ms
Image

I hope this can be useful information for people trying to log data at high speed with low and predictable lag. Of course it would be great to hear comments from all the experts that are here in the forum.

V!nce
Posts: 20
Joined: Sun May 22, 2022 4:35 pm

Re: File write speed

Post by V!nce » Sun Jul 24, 2022 1:05 pm

Hello!

I am very interested in what you did. I'm working on the same kind of problems and started a similar discussion some time ago (i.e. how to write a lot of data in a file without impacting the performance too much). I am gonna try to replicate what you did but I wanted to know if you could post some code snippets you used?
I am interested to see how you implemented the multithreading writing, the queue, the lock etc... For now, the way i did it was just to have a global bytearray that gets filled up by data from multiple asyncio coroutines while a thread is running in parallel and writes the data at regular time intrervals.
I was also wondering how you mounted the SD card: via the dedicated chip or via SPI?

Thanks for the benchmarks and explaination! I would never have thought that using an external storage device would increase speed!
Cheers

jornamon
Posts: 3
Joined: Mon Jul 04, 2022 4:30 pm

Re: File write speed

Post by jornamon » Thu Jul 28, 2022 4:23 am

I'm glad it was useful to you!

The code can and the results can be found here: https://github.com/jornamon/micropython ... peed_tests. It's not very tidy but better than nothing.

In the above testing, the SD card was connected always via SPI, but recently I found a Teensy 4.1 laying around and I tested it. It has a single core processor, but besides being really fast, it has an integrated SD card reader which uses a 4-bit SDIO protocol instead of SPI, which made it perform very well even for T=10ms. This board or others using SDIO could be an interesting option in cases where multithreading does not really solve the problem because the data throughput is constantly higher than max write speed (queue + threading won't solve a persistent write speed deficit). The Teensy performs almost as poor as other boards when connecting an external SPI reader instead of using the SDIO slot.

A few more things I found after the initial tests:
  • Performance improves incrementing the SPI frequency (up to a point, just check if the defaults are optimal for your setup)
  • Even equally rated cards can perform differently.
  • Filesystem parameters also affect performance. In my setup, the better performance was achieved when using the biggest cluster size in FAT32 (512 bytes per sector, 128 sector per cluster).
I haven't cleaned up those findings, but you can dive into the data (excel file) if you feel like doing so. I cannot offer a deep explanation for the result but I hope these empirical facts can give you some hints.

Please share your progress!

Post Reply