How to store high throughput, real time data to the flash

All ESP32 boards running MicroPython.
Target audience: MicroPython users with an ESP32 board.
Post Reply
V!nce
Posts: 20
Joined: Sun May 22, 2022 4:35 pm

How to store high throughput, real time data to the flash

Post by V!nce » Sun May 22, 2022 5:14 pm

Hello,

This is my first post here! I started coding in micropython this weekend and it's been a real blast. Small disclaimer before explaining my project/issues: I'm not YET using a real ESP32, for now I'm coding in Wokwi's esp32 simulator (https://wokwi.com/projects/new/micropython-esp32). My controlers should arrive during the week.

So, the project is pretty simple, I have many biosensors collecting body data at a high rate (max around 64 acq/sec for one of the probe). I have no problem running all the async loops and collect the data. In theory, the device should be connected to a smartphone and stream data every second, then clear the data lists and repeat until turne off.

However, implementing a local mode is crucial. If the device is disconnected, it should be able to log all the data in the flash, either until it connects back to the phone or until the end of the session. However, I can't find a good, optimized and clean way to save the data locally.

I thought of/tried:
  1. Appending values to several csv files, however, because i would be writing an unknown quantity of data every second, I feel like it is going to be very slow and/or fragment the data on the flash too much. Also, csv are quite heavy because of the encoding.
  2. Using json or Tinydb but both are worse than csv files in term of encoding.
  3. I tried to open, write into, and close binary files every_second for every sensor and write packed lists that i would then clear. The idea is to have a timestamp in the file names that would allow me to reconstruct the data later. However, after just a few seconds, the simulation slows down and stops, with no way to know if it's because i reached the flash limit or not, which seems unlikely.
  4. Using the btree database to append sensor data to keys defined at the beginning instead of creating new keys for each entry. This does not seem to work properly and I have no way/don't know how to verify if the data is flushed to the stream correctly. I also believe calling the key/value pair to append my bytes to the value is a problem.
I am kinda running out of ideas, and eventhough i'm quite experimented in CPython and logic optimisation, I'm just discovering the world of embeded devices haha :D
I already red a lot of things on the forum that helped me start but I couldn't really find related topics. I thought this would be the easy part of the project lmao

Thanks in advance for your advices!
Last edited by V!nce on Sun May 22, 2022 6:20 pm, edited 1 time in total.

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: How to store high throughput, realtime data in the falsh

Post by pythoncoder » Sun May 22, 2022 5:46 pm

What is your expected throughput in bytes/sec? What is the maximum duration of a disconnect (i.e. total storage volume).
Peter Hinch
Index to my micropython libraries.

V!nce
Posts: 20
Joined: Sun May 22, 2022 4:35 pm

Re: How to store high throughput, realtime data in the falsh

Post by V!nce » Sun May 22, 2022 6:09 pm

Here is the list of data I plan to collect with the corresponding sampling rate:
  • Skin conductance, 16 times/sec
  • SpO2, 64 times/sec
  • XYZ acceleration, 10 times/sec/axis
  • XYZ Gyration, 10 times/sec/axis
  • Skin temperature, 1 time/sec
I think most measures necessitate a maximum of 2 bytes to be encoded (int 16 if i'm right?) once packed. So it's about 282 bytes/sec to store/send. I don't know if it's a lot or not but be aware that for now, i'm appending the values to lists because I get them as python ints, not bytes. We the sampling rate described, the 111ko of RAM available are filled up in about 5-10 sec...

My idea was simple, after a disconnection, we can store some data in the ram. If we get the connection back in a few second before the heap is filled, we could send the cached data. If we can't get it, then we store it locally indefinitely instead of sending it. I am planning to buy a SD card shield to be more flexible. I would like not to hammer the small 4 or 8 MB of onboard flash.

Also, if am I not mistaken, writing to files is blocking right?

V!nce
Posts: 20
Joined: Sun May 22, 2022 4:35 pm

Re: How to store high throughput, real time data to the flash

Post by V!nce » Mon May 23, 2022 11:31 pm

Hi,

So i have been thinking and working on a solution that was actually pretty simple and works well. I created another coroutine that takes the buffers and write them in a binary file. This works flawlessly. My only problem now seems to be the sampling rates: whatever the frequency I set, every second there will not be N samples taken for each sensor. From my early testing, it seems to be because of uasyncio. I wanted to run all my sampling in seperated tasks as I thought it was an elegant and natural thing to do but it seems to actually be a problem.

I also realised that I was completely wrong in my first posts. Indeed, compared to a few things I saw today, I'm absolutely not working with big amounts of data. I thought I was asking too much to the ESP while it was just me being inexperienced with asyncio in general. I will still have to find a solution for my new problem. Tbh i'm just hoping the simulator is missbehaving but I'm probably dellusional.

I'm hesitant to delete the topic as it is quite useless and as the solution has already been documented.

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: How to store high throughput, real time data to the flash

Post by pythoncoder » Wed May 25, 2022 11:08 am

Writing to a file does block so this will cause other uasyncio tasks to pause. Asking a task to schedule at a rate of 64Hz is ambitious: tasks are scheduled in round-robin fashion so all your other tasks will get a slice of the action in between runs of the 64Hz task. Unless they are very minimal and efficient, this is unlikely to happen in 15ms.

For rates as high as that you really need a hard IRQ. Alas the ESP32 supports only soft IRQ's.

If you can tolerate brief interruptions by file writes and garbage collection you should be OK. If not, you may need to look at platforms that support hard IRQ's (Pyboard/STM32, Pico/RP2). Even then you'd need to measure file write blocking time with your data.
Peter Hinch
Index to my micropython libraries.

V!nce
Posts: 20
Joined: Sun May 22, 2022 4:35 pm

Re: How to store high throughput, real time data to the flash

Post by V!nce » Wed May 25, 2022 11:25 pm

Thanks a lot for your advices.

So the idea is to put everything in a watch, and the esp32 is absolutely perfect for this because of its integration of BLE and WiFi, the very small form factors available with battery plugs and regulator as well as USB-C. It is clearly more comfortable for someone as inexperienced as I am.
So I will still try it on ESP32 especially because I am getting 3 for free next week. Also the watch doesn't do anything except measuring these 6 to 9 variables (I can do the processing on the phone or on pc, it's just applying filters to smooth the curves basically...).

I may have noticed a bug in Wokwi and uasyncio on ESP32. When I time one coroutine doing 1000 analogue measurements and the same code but in a synchronous function, the task takes around 2000ms to complete whereas the function takes about 1000ms. When I run exactly the same code one the simulated RP Pico, I get 1000ms in both cases so I think it's possible to achieve my objective if it's a weird glitch with the simulation (I hope). Weirdly enough, I saw that using uasyncio.sleep_ms() gave even worse results than just uasyncio.sleep() on the esp32 (by a noticeable margin actually!).

I may have found a way to alleviate having long and irregular interruptions. Instead of having 9 different coroutines that write all my buffers every second, I could simply write the 2 data bytes everytime I get them. I could also add a small offset to the await uasyncio.sleep() to counter the time taken to do a measure and write it.

For now I'm opening 9 files at the same time to write the bytes in them every second, but I could maybe find a way to write everything in the same file. However, I do not have much idea on how to format such file. I may directly decide to partition the memory in 9 so files don't get fragmented. I don't know if it's an issue but I believe it's cleaner that way anyway.

If you have any idea on how to format my data I'd be glad to hear from you. I was thinking that instead of packing 2 bytes, I could pack 3, (one indicator byte to know what measure is going to follow, and 2 bytes for the value). So I can store everything in one buffer AND file, avoid RAM and Flash fragmentation, ease the writing and GC. It is less data efficient as it increase the size of files and buffer by 50% but it may be worth. I will have to use a USB stick because if I am correct, the 16MB available on the TTGO T7 1.5 will only be enough for half a day. So if I add an SD card I will have waaaay more memory than necessary! For the RAM issue, I think it can handle having a larger buffer as I know I have more than 50ko available. Having less coroutines loaded at once may also offset that loss of efficiency.

I'm going to keep you updated and I'm really crossing my fingers. Otherwise I'll do as you suggested and buy a RP Pico which is amazingly cheap! But I'll have to work with not having ble and battery options directly on the board...

User avatar
karfas
Posts: 193
Joined: Sat Jan 16, 2021 12:53 pm
Location: Vienna, Austria

Re: How to store high throughput, real time data to the flash

Post by karfas » Thu May 26, 2022 7:43 am

You want to save the data of
Skin conductance, 16 times/sec
SpO2, 64 times/sec
XYZ acceleration, 10 times/sec/axis
XYZ Gyration, 10 times/sec/axis
Skin temperature, 1 time/sec
I seriously doubt you will get any values worth to save at this pace from the SpO2+conductance+temperature sensors. The human body doesn't change that fast, even if your sensors can read e.g. SpO2 64 times/sec.

You might be able to save a lot of memory (and therefore, time writing to the files) by adding a counter for similar values.
A few hours of debugging might save you from minutes of reading the documentation! :D
My repositories: https://github.com/karfas

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: How to store high throughput, real time data to the flash

Post by pythoncoder » Thu May 26, 2022 9:06 am

Opening nine files at once sounds cumbersome. You might like to read my file formatting FAQ.
Peter Hinch
Index to my micropython libraries.

V!nce
Posts: 20
Joined: Sun May 22, 2022 4:35 pm

Re: How to store high throughput, real time data to the flash

Post by V!nce » Thu May 26, 2022 8:00 pm

Hi!

Opening and closing them isn't really a problem, I just used a loop that would parse the buffer and launch a writing task for each metric it found. That would allow me to dynamically adjust what I measure at the end of the code with minimal hardcoding. Hover I completely agree that it is unoptimized and a lazy solution and that it introduces a lot of overhead!

I red what you sent, thank you for the link! I wish I found that earlier, it would have made things clearer for me haha there a re a few things that I didn't see in the doc so very useful. it also confirms that I should really use packing as everything seems to use the write method anyway to store data in files. I was already using the ustruct librairies so I was on the right track! :)

Do you know if writing in a files actually immediately transfers data from the ram to the flash? Or maybe does it store changes in the ram for a while?

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: How to store high throughput, real time data to the flash

Post by pythoncoder » Mon May 30, 2022 4:59 pm

V!nce wrote:
Thu May 26, 2022 8:00 pm
...
Do you know if writing in a files actually immediately transfers data from the ram to the flash? Or maybe does it store changes in the ram for a while?
Normally there is no reason to be concerned about this. There is (obviously) a guarantee that data will be written out when you close the file, or issue an implicit close() on exit from a context manager. You can also call flush() on a stream to force a write.
Peter Hinch
Index to my micropython libraries.

Post Reply