@micropython.native not implemented too long

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: @micropython.native not implemented too long

Post by jimmo » Tue Jul 12, 2022 11:38 am

martincho wrote:
Tue Jul 12, 2022 7:46 am
What's interesting is that the source files, unmodified, run just fine on the RP2040.
Are you running mpy-cross with exactly the same configuration as the board is using, specifically `march=armv6m`? It should be

Code: Select all

$ mpy-cross -march=armv6m <path>
My suspicion is that your function code was right on the edge of being too big, and a slight change (i.e. minifier or mpy-cross with a slightly different build version or configuration) was enough to push it over the edge.

But further to what Peter already said... if you're hitting this error then your function must be huge. Something's not right.
martincho wrote:
Tue Jul 12, 2022 7:46 am
Well, not sure what to do. All I can do is remove one @micropyhon.native decorator at a time and try again. Which, of course, burns time at a rate proportional to the square of your desperation to get the job done.
In general yes of course, but in this case you know exactly which file is causing the problem, and how many functions do you have in one file that are that big? Surely there's got to be an obvious candidate that you can split into two functions. We're talking about a >12bit branch here!
martincho wrote:
Tue Jul 12, 2022 7:46 am
I think I have probably wasted days chasing after errors that do not provide sufficient information for one to at least get close to the problem with a reasonable effort. I had one case where I got a unicode error and the only traceback offered was, effectively main(). The source file for that one was nearly 2000 lines of code. It probably took a couple of hours to find the problem.
It would be good to understand this better because what you're describing sounds like a bug in the compiler? (Somehow no line numbers and no traceback... was the optimisation level set to disable line numbers?)

There's definitely an art to quickly finding errors in Python code. Any time you have code that re-raises, or cancellation etc it does get tricky.

You might also be interested in the recent work to add PDB support to MicroPython. See this video from last week for a summary https://www.youtube.com/watch?v=RRN_ngx01vM

Also at least with MicroPython you got an error at all! :) Think about the hours you could have spent tracking that down in C code.

Obligatory comment about how MicroPython has to keep error messages short and omits a few checks in order to keep the firmware small.... but back to the original problem with the function being too big, the error is pretty clear!

martincho
Posts: 96
Joined: Mon May 16, 2022 9:59 pm

Re: @micropython.native not implemented too long

Post by martincho » Tue Jul 12, 2022 2:59 pm

pythoncoder wrote:
Tue Jul 12, 2022 10:03 am
I suggest you review your use of micropython.native. Like all optimisations it's best only to apply it where there is a true performance bottleneck.

My approach is to identify which part of the code is causing the slowdown and optimise it until it sqeals. micropython.native rarely makes a large difference. micropython.viper makes a big difference but needs some effort to use. For even better performance consider inline assembler or C modules.
Precisely my approach. While I do apply the micropython.native to most functions, a good number of them don't need it. And, yes, I do optimize code to the absolute limit before resorting to such tools. I take this as far as measuring execution time of statements and lines of code down to the microsecond with the logic analyzer.

Image

The image above shows four simultaneous communications channels and a couple of status flags. One of the key optimizations was the communications latency between two ports, seen between the first and third lines. In this case it is down to 268 microseconds, or about 3 bytes at 115,200 baud. I need it to be closer to one character but that is simply impossible with MicroPython unless you do absolutely nothing with the data as the bytes arrive. If I didn't use 'native on this portion of the code the latency would be much greater.

Also, note what happens to the spacing of the bytes on line 3. The bytes should come out one after the other, but due to MicroPython overhead I simply cannot get around, I get gaps of as much as 125 microseconds. Believe me when I say I have optimized this a million different ways. When accessing a value in a variable can take 100 microseconds there isn't much you can do other than come down to the most basic types to reduce that to, maybe 20 microseconds or so.

The fifth line shows two pulses, the first is the busy flag from the first port (top trace) and the second pulse is the time it takes for that port's packet processor to do its job and issue a response. These pulses are 903 and 578 microseconds, respectively. The busy time is deeply affected by the character-to-character gap you see on trace #3. The packet processor processing time has been optimized to the best of my abilities.

This is how microseconds turn into milliseconds. Right now the time from the start of packet to the packet processor done state is about 2 milliseconds. This is the equivalent of 23 characters at 115,200 baud to fully process a 5 character packet. This is how a bunch of microseconds turn into milliseconds. When I started this time was significantly greater than this. I got down to 2 ms by shaving a few microseconds here and there and restructuring code multiple times. Without micropython.native this is likely impossible.

I can't use 'viper because, well, I didn't know enough to be able to structure the code to be able to use it. Once I realized I could not get any more performance out of 'native it was simply too late to re-engineer this almost from scratch.

The only way to do better would be to recode the processor in assembler, which would require setting-up a block of memory with a bytes() object to effectively create the equivalent of a set of memory-mapped data structures at fixed locations one could access from assembler code. That's the way we used to do it in the old days of 6502, 8051, etc., 8-bit microcontrollers.

The point is that I think --not so full of myself to say 'I know'-- I have trimmed as much fat as possible at the MicroPython level. If I knew what I know today when this project started I would have implemented the communications layer in C and compiled it into MicroPython as a service to the rest of the application, which could then use MP without serious concerns about performance. I would also provide what I would call data services from C to the MicroPython layer by managing performance-sensitive data in C and providing access to MP in a manner that would not cause memory allocation and garbage collection. Etc.

As I said, had I known. However, this is the way one learns. Right?

Sadly I am out of time. We are shipping the first 250 units next week and I have to be extremely careful about touching code that should not be touched. Any changes to code that has already been tested triggers detailed and time-consuming regression testing (which is hard to automate due to the nature of the application). I am currently finalizing testing of the firmware update layer. This is uncovering a few issues here and there but mostly working just fine. BTW, yes, your suggestion to switch to mpy files was right on point. Wonderful and super-simple. I absolutely wasted my time working with minification. Lesson learned.

This has been a very interesting experience. The help I got here has been nothing less than invaluable, both from people who replied directly to my posts and just reading other posts. The way we got here --after having to redesign this one board three times due to the STM32 chip shortage issues-- was by having to adopt the RP2040 for the final design and having to deliver a working board faster than could have been achieved by having to write the code in C. Hence the head-first dive into MicroPython.

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: @micropython.native not implemented too long

Post by pythoncoder » Tue Jul 12, 2022 4:28 pm

Interesting. As a matter of interest do you use the second core?
Peter Hinch
Index to my micropython libraries.

martincho
Posts: 96
Joined: Mon May 16, 2022 9:59 pm

Re: @micropython.native not implemented too long

Post by martincho » Tue Jul 12, 2022 4:32 pm

jimmo wrote:
Tue Jul 12, 2022 11:38 am
Are you running mpy-cross with exactly the same configuration as the board is using, specifically `march=armv6m`? It should be
Yes, exactly that configuration.
But further to what Peter already said... if you're hitting this error then your function must be huge. Something's not right.
Not really. Well, at least I don't think so. After getting some rest I went through and commented-out the 'native decorators one-by-one until mpy generation worked. This is the culprit (all comments deleted):

Code: Select all

def receive_file(self, packet):
    if packet and ("cmd" in packet) and ("data" in packet):
        if packet["cmd"] == _CMD_FILE_TRANSFER:
            data = packet["data"]
            if len(data) >= _FTP_DATA_MIN_LENGTH:
                sequence = data[_FTP_SEQUENCE_BYTE_0] * 256 + data[_FTP_SEQUENCE_BYTE_1]
                if self.file_receive_timeout == 0 and self.file_receive_state == _FTP_WAITING_FOR_DATA:
                    print("File transfer timeout")
                    self.file_receive_state = _FTP_IDLE
                    self.npt_status = _NPT_IDLE
                if self.file_receive_state == _FTP_IDLE:
                    received_name = ""
                    try:
                        received_name = bytes(data[_FTP_FILE_NAME_INDEX:]).decode()
                    except UnicodeError:
                        print("Invalid file name")
                    else:
                        if _FTP_FILE_NAME == received_name:
                            if sequence == _FTP_START_SEQUENCE:
                                print(f"\nReceiving update file: {_FTP_FILE_NAME}")
                                self.file_receive_sequence = 0
                                self.file_receive_timeout = _FILE_TRANSFER_TIMEOUT
                                self.file_buffer_index = 0
                                fp = open("/" + _FTP_FILE_NAME, "wb")
                                fp.close()
                                self.npt_status = _NPT_RECEIVING_FILE
                                self.file_receive_state = _FTP_WAITING_FOR_DATA
                elif self.file_receive_state == _FTP_WAITING_FOR_DATA:
                    gc.collect()
                    if sequence == _FTP_END_SEQUENCE:
                        if self.file_buffer_index:
                            with open("/" + _FTP_FILE_NAME, "ab") as fp:
                                fp.write(self.file_buffer[:self.file_buffer_index])
                            self.file_buffer_index = 0
                        self.npt_status = _NPT_FILE_RECEIVED_OK
                        print(f"File received: {_FTP_FILE_NAME}")
                        self.file_receive_state = _FTP_IDLE
                    else:
                        if sequence == self.file_receive_sequence + 1:
                            self.file_receive_sequence += 1
                            self.file_receive_timeout = _FILE_TRANSFER_TIMEOUT
                            new_data = bytes(data[_FTP_FILE_DATA_INDEX:])
                            start = self.file_buffer_index
                            self.file_buffer_index += len(new_data)
                            self.file_buffer[start : self.file_buffer_index] = new_data
                            if self.file_buffer_index >= _FTP_BUFFER_SIZE:
                                with open("/" + _FTP_FILE_NAME, "ab") as fp:
                                    fp.write(self.file_buffer[:self.file_buffer_index])
                                self.file_buffer_index = 0
                        else:
                            self.npt_status = _NPT_FILE_RECEIVED_ERROR
                            self.file_receive_state = _FTP_IDLE
                else:
                    self.npt_status = _NPT_IDLE
                    self.file_receive_state = _FTP_IDLE
Don't let the use of "FTP" confuse you. I am just using the nomenclature to refer to my own file transfer protocol.

As I said, runs fine on RP2040 without converting to mpy. Attempting to make the module that contains this bit of code into a precompiled mpy generates that error unless I comment out the 'native decorator.

Not sure it makes a difference. This is a method in a class. Every method in the class is marked at 'native except for __init__() and most getter/setter methods.
In general yes of course, but in this case you know exactly which file is causing the problem, and how many functions do you have in one file that are that big?
I think this is precisely the problem. If we agree on that the above function isn't large at all, you can see that hunting for this error on a large file with lots of functions can be very time consuming. The obvious long functions is what I go for first. And then, what?

It would be very useful if the "too long" error message provided a line number or function name. I have not messed with the optimization level at all, so line numbers are still there for most error messages.
You might also be interested in the recent work to add PDB support to MicroPython. See this video from last week for a summary https://www.youtube.com/watch?v=RRN_ngx01vM
Interesting. Thanks.
Think about the hours you could have spent tracking that down in C code.
Well, I think the big difference is that writing performant code in C would be so much easier that virtually none of of the problems would be issues at all. As I said in my prior post, had I known what I know now at the start of this project I would have written the real time portions of this application in C and compiled it into MicroPython. That would have been the way to go, no question about it. Hindsight is a wonderful thing, isn't it?
Obligatory comment about how MicroPython has to keep error messages short and omits a few checks in order to keep the firmware small.... but back to the original problem with the function being too big, the error is pretty clear!
Agreed and understood. Just want to clarify one more time that I am not complaining at all. A bit of frustration under pressure? Sure. I do understand though and I am trying to help to the extent possible at the moment.

Do you think that function is too long? Unless I am missing something fundamental, I'd be surprised if you said yes. For example, there's a call to gc.collect() in there. There's some file I/O. Do these go into the accounting of method length for 'native code generation? Should a 'native method not have any calls whatsoever to other functions at all?

martincho
Posts: 96
Joined: Mon May 16, 2022 9:59 pm

Re: @micropython.native not implemented too long

Post by martincho » Tue Jul 12, 2022 4:41 pm

pythoncoder wrote:
Tue Jul 12, 2022 4:28 pm
Interesting. As a matter of interest do you use the second core?
I will in the future. I hope you agree with this: I have been avoiding using the second core because I don't want to fall into the trap of throwing hardware at bad code. It's all too easy to write bad code and "fix it" by using the second core. Being that this is my first serious MicroPython project, my code has definitely been bad --from one or more perspectives-- for some time. There could still be non-optimal code in there. I doubt it, yet it would not be impossible at all. I'll have more time to dive into this in a few weeks.

Here's another way to put it: I could definitely write this application in C on, say, a 25 MHz 8051-derivative (1 clock cycle per instruction) 8 bit processor and it would run just fine. Writing it on a 160 MHz processor with four times the RAM and an ocean of flash should not require that I resort to using two cores.

In a future implementation I would like to take the baud rate up significantly higher, perhaps 5x to 10x. First I'll re-implement the communications code in C, compile into MicroPython and see what a single core can do. Then, I think, it will make sense to have the second core manage real-time communications and let the first core handle the lazy application layer.

Post Reply