Thank you so much for your reply jimmo! It was really useful and I learnt a lot. I hope many people looking for low-lag high-frecuency data logging in Micropython comes across it at some point.
I'm sorry for my delayed reply, but I couldn't find the time to finish my testing, and I wanted to include the results here, because maybe they can be useful to others.
In fact, the results pretty much confirm what jimmo and others have said in this thread, that having the log file in the same flash chip that is used for XIP is asking for trouble, especially as the the sampling period gets small. In fact, the Pi Pico crashed using the XIP flash por T=~10ms or faster. The ESP32 didn't crash, but the benefits of multithreading were partially negated. I believe the _thread implementation of those two ports is quite different.
What I have found interesting is that multithreading works really well when storing the file in a different place, almost completely getting rid of long delays.
As far as I can see right now,
the preferred method for high frequency file log should be using a different chip/card for the file and use different threads for file write and sampling tasks. One caveat is the the queue grows indefinitely if the sampling throughput is higher that what you can write to the file, but at least one can decide how to deal with the growing queue depending on the application.
Regarding the tests, they were made simulating a sensor(s) sampling loop with a delay (no real sensor were attached), for each loop/sample, 50 integers of data were generated, and a total of 500 samples were taken on each to the tests. The sampling periods (T) were 50, 25 and 10 ms.
Two board were used, a Pi Pico and a FeatherS3 (ESP32-S3 with 8MB os PSRAM).
Three file storage systems were considered: the flash chip where the program is stored (flash(xip)), a XTSD chip from Adafruit (xtsd)
https://www.adafruit.com/product/4899 which behaves like an SD card but without a removable card, and which appears to be faster that a regular SD card reader, and finally a regular SD card (sdcard) (Sandisk class 10).
All of those storage devices were tested with a single thread (st) and a multithreaded (mt) version, which used one thread that samples and writes to a queue (with a lock) and another thread that reads the queue and writes to the file.
For each test run I calculated the Max, Min, Avg and the Std. deviation of the sampling times. Another metric that I called overhead, (Tavg-T)/T, was added to facilitate comparison among all test runs with a single meaningful number.
I think overhead combined with the Std. deviation offer a good picture of the performance without too much data, so I highlighted those two in one set of charts, both in a method-to-method and a board-to-board comparison way. I only did this one for the T=10ms case, which is the one that pushes a little bit the limits.
The other charts are the whole statistical data for all the test runs.
Overhead and std. dev for T=10ms
T=10ms
T=25ms
T=50ms
I hope this can be useful information for people trying to log data at high speed with low and predictable lag. Of course it would be great to hear comments from all the experts that are here in the forum.