pythoncoder wrote: ↑Tue Jul 12, 2022 10:03 am
I suggest you review your use of
micropython.native. Like all optimisations it's best only to apply it where there is a true performance bottleneck.
My approach is to identify which part of the code is causing the slowdown and optimise it until it sqeals.
micropython.native rarely makes a large difference.
micropython.viper makes a big difference but needs some effort to use. For even better performance consider inline assembler or C modules.
Precisely my approach. While I do apply the
micropython.native to most functions, a good number of them don't need it. And, yes, I do optimize code to the absolute limit before resorting to such tools. I take this as far as measuring execution time of statements and lines of code down to the microsecond with the logic analyzer.
The image above shows four simultaneous communications channels and a couple of status flags. One of the key optimizations was the communications latency between two ports, seen between the first and third lines. In this case it is down to 268 microseconds, or about 3 bytes at 115,200 baud. I need it to be closer to one character but that is simply impossible with MicroPython unless you do absolutely nothing with the data as the bytes arrive. If I didn't use 'native on this portion of the code the latency would be much greater.
Also, note what happens to the spacing of the bytes on line 3. The bytes should come out one after the other, but due to MicroPython overhead I simply cannot get around, I get gaps of as much as 125 microseconds. Believe me when I say I have optimized this a million different ways. When accessing a value in a variable can take 100 microseconds there isn't much you can do other than come down to the most basic types to reduce that to, maybe 20 microseconds or so.
The fifth line shows two pulses, the first is the busy flag from the first port (top trace) and the second pulse is the time it takes for that port's packet processor to do its job and issue a response. These pulses are 903 and 578 microseconds, respectively. The busy time is deeply affected by the character-to-character gap you see on trace #3. The packet processor processing time has been optimized to the best of my abilities.
This is how microseconds turn into milliseconds. Right now the time from the start of packet to the packet processor done state is about 2 milliseconds. This is the equivalent of 23 characters at 115,200 baud to fully process a 5 character packet. This is how a bunch of microseconds turn into milliseconds. When I started this time was significantly greater than this. I got down to 2 ms by shaving a few microseconds here and there and restructuring code multiple times. Without
micropython.native this is likely impossible.
I can't use 'viper because, well, I didn't know enough to be able to structure the code to be able to use it. Once I realized I could not get any more performance out of 'native it was simply too late to re-engineer this almost from scratch.
The only way to do better would be to recode the processor in assembler, which would require setting-up a block of memory with a bytes() object to effectively create the equivalent of a set of memory-mapped data structures at fixed locations one could access from assembler code. That's the way we used to do it in the old days of 6502, 8051, etc., 8-bit microcontrollers.
The point is that I think --not so full of myself to say 'I know'-- I have trimmed as much fat as possible at the MicroPython level. If I knew what I know today when this project started I would have implemented the communications layer in C and compiled it into MicroPython as a service to the rest of the application, which could then use MP without serious concerns about performance. I would also provide what I would call data services from C to the MicroPython layer by managing performance-sensitive data in C and providing access to MP in a manner that would not cause memory allocation and garbage collection. Etc.
As I said, had I known. However, this is the way one learns. Right?
Sadly I am out of time. We are shipping the first 250 units next week and I have to be extremely careful about touching code that should not be touched. Any changes to code that has already been tested triggers detailed and time-consuming regression testing (which is hard to automate due to the nature of the application). I am currently finalizing testing of the firmware update layer. This is uncovering a few issues here and there but mostly working just fine. BTW, yes, your suggestion to switch to mpy files was right on point. Wonderful and super-simple. I absolutely wasted my time working with minification. Lesson learned.
This has been a very interesting experience. The help I got here has been nothing less than invaluable, both from people who replied directly to my posts and just reading other posts. The way we got here --after having to redesign this one board three times due to the STM32 chip shortage issues-- was by having to adopt the RP2040 for the final design and having to deliver a working board faster than could have been achieved by having to write the code in C. Hence the head-first dive into MicroPython.