UART data loss despite large RX buffer

RP2040 based microcontroller boards running MicroPython.
Target audience: MicroPython users with an RP2040 boards.
This does not include conventional Linux-based Raspberry Pi boards.
Post Reply
Filip_KONAX
Posts: 16
Joined: Wed Jun 08, 2022 1:05 pm

UART data loss despite large RX buffer

Post by Filip_KONAX » Tue Jun 14, 2022 10:26 am

Hi all

I'm struggling with the internals of the UART IRQ ringbuffer mechanism (I guess)

In my project; the RPi PICO is connected to a Waveshare Pico-SIM7020X-NB-IoT (modem for NB-IoT) through serial port 0.
Unfortunately this hardware setup does not offer hardware flow control.
Communication between modem and PICO is through AT commands, quite standard.
I created the UART instance with "large" buffers inside my communication class:

Code: Select all

self.uart = machine.UART(0, 115200, tx=Pin(0), rx=Pin(1), bits=8, parity=None, stop=1, rxbuf=2000, txbuf=1000)
I've been digging in the machine_uart.c of the RP2 port (MicroPython v1.18) and have found that the UART is using the interrupt mechanism to fill a ringbuffer. Despite this IRQ based data receiving, I (seem to?) loose data when I gradually process the incoming data.

I have two different methods for reading data: a 'dumb' one that just waits a long time and a second (more intelligent?) one that should only wait until a certain expected response has been received (+\r\n). This method also sends the AT command first to the modem. Because sometimes more data is sent by the modem than what we need, the remainder of the data remains in the rxbuffer (that's why I purge the read buffer before I proceed to sending the 'next' AT command).

Method one ("dumb"):

Code: Select all

    def waitResp(self, timeout=2000):
        prvMills = ticks_ms()
        resp = b""
        while ticks_diff(ticks_ms(),prvMills) < timeout:
            if self.uart.any():
                resp = b"".join([resp, self.uart.read(1)])
        return self.safe_decode(resp,"waitResp")
Method two ("smart"):

Code: Select all

    def sendCMD_waitSpecificResponse(self, cmd, timeout=2000, responseList = ["OK","ERROR"]):
        resp = b""
        decodedString = ""
        responseFromListReceived = False

        #purge buffer
        while self.uart.any() : self.uart.read(1)

        prvMills = ticks_ms()
        self.uart.write((cmd + '\r\n').encode()))

        while (ticks_diff(ticks_ms(),prvMills) < timeout) and not responseFromListReceived:
            if self.uart.any():
                resp = b"".join([resp, self.uart.read()])
                decodedString = self.safe_decode(resp,"sendCMD_waitSpecificResponse")
                for expectedResponse in responseList:
                    if decodedString.rfind(expectedResponse) > 0:
                        if decodedString.rfind(expectedResponse) < decodedString.rfind("\r\n"):
                            responseFromListReceived = True
                            startExpectedResponse = decodedString.rfind(expectedResponse)
                            endExpectedResponse = decodedString.find("\r\n",startExpectedResponse)
                            return decodedString[startExpectedResponse:endExpectedResponse]
        #only getting here in case of a time-out
        return ""
Just to give complete info: the "safe_decode" is here (not to crash the program when strange, non-decodable byte sequences are read)

Code: Select all

    def safe_decode(self,instr, call_reference):
        try:
            return instr.decode()
        except Exception as e:
            self.log.error("Exception while decoding binary string \"" + repr(e) + " in : " + call_reference)
            return ""
My second method is doing very well for short messages. Instead of waiting a 'safe' time (e.g. 200ms) on the response of an AT command (e.g. some modem setting), I only wait for an "OK" or "ERROR" to be received. That give me a huge time gain; as these short answers come in quickly most of the time. As modem response delays are not predictable, I can set the time-out on a safe 2000ms without always having to wait that long...
BUT... when I use my smart method on long messages, things go wrong: I'm not able to receive all data, the method exits with a time-out. Debugging showed me that at a certain moment, no more data is available (uart.any() = 0), but I'm sure I didn't receive all.
I guess there is a maybe a FIFO buffer overrun in the PICO hardware, but I presumed that because the reading of the PICO UART FIFO is done by the uart_service_interrupt routine, I wouldn't loose data...

Example data received just waiting 10 seconds with the "dumb" method:
AT+CENG?\r\r\n+CENG: 6151,3,240,"0175C3C9",-101,-12,-88,0,20,"4661",0,,-90\r\n+CENG: 6151,3,289,-102\r\n+CENG: 6151,3,501,-104\r\n+CENG: 6151,3,251,-107\r\n\r\nOK\r\n

Using method two, I only get to receive:
AT+CENG?\r\r\n+CENG: 6151,3,273,"0175A8C9",-

I find this very annoying, because I made this second method mainly for gaining time with the AT commands that produce longer answers. As they tend to take a while before the first character arrives, I now need to put in very long time-outs to be sure to have received all... Receiving all means (mostly) that the answer ends with an "OK" or "ERROR", so I guess the "smart" method was likely to be the good solution for my problem. But as stated above: I tend not to receive all when checking whether I received all...

Am I missing something :? ? Advice, hints or solution very much appreciated :D !

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: UART data loss despite large RX buffer

Post by pythoncoder » Tue Jun 14, 2022 12:09 pm

Since your data appears to be newline-terminated why not use UART.readline()? This is usually the most efficient way to read such data.

You might also consider uasyncio.StreamReader as in this demo.

I've done some fairly severe stress testing on the RP2 UART. With current firmware I've not been able to fault it.
Peter Hinch
Index to my micropython libraries.

Filip_KONAX
Posts: 16
Joined: Wed Jun 08, 2022 1:05 pm

Re: UART data loss despite large RX buffer

Post by Filip_KONAX » Tue Jun 14, 2022 12:28 pm

Thank you for your reply. I've altered my code using UART.readline(), but the same problem remains.
If I start processing the received data during reception, data gets lost...
Would there be possibly any lines in my "smart method" that inhibit the IRQ?
Considering your stess tests: do they also test the good working of the ringbuffer while doing other things with the PICO?
As the UART RP2 port uses interrupts, serial reception is expected to be flawless while doing other things or am I wrong?
pythoncoder wrote:
Tue Jun 14, 2022 12:09 pm
Since your data appears to be newline-terminated why not use UART.readline()? This is usually the most efficient way to read such data.

You might also consider uasyncio.StreamReader as in this demo.

I've done some fairly severe stress testing on the RP2 UART. With current firmware I've not been able to fault it.

KJM
Posts: 158
Joined: Sun Nov 18, 2018 10:53 pm
Location: Sydney AU

Re: UART data loss despite large RX buffer

Post by KJM » Wed Jun 15, 2022 6:36 am

Quitting the uart.read loop when you've found what you want leads to data truncation on that combo. By trial & error I found that waiting 200 cycles of the loop (~100ms) for extra data to dribble in solves the problem, which seems to be with the 7020 rather than the pico.

Filip_KONAX
Posts: 16
Joined: Wed Jun 08, 2022 1:05 pm

Re: UART data loss despite large RX buffer

Post by Filip_KONAX » Wed Jun 15, 2022 7:55 am

Hi, thank you for your reply, but I'm not confident that your solution will solve the problem, but I'll give it a try anyway.
Why? Because I'm not quitting the loop before I've found the matching pattern + an eol and the data reception stalls before reaching that point (see my examples of received data) So, it's exactly while using the "matching algorithm" that we don't get to receive the data that enables quitting the loop :? :(
I guess I'll need to dive into the source code of the different lines executed during loop to find out what's inhibiting the ISR to run properly.
KJM wrote:
Wed Jun 15, 2022 6:36 am
Quitting the uart.read loop when you've found what you want leads to data truncation on that combo. By trial & error I found that waiting 200 cycles of the loop (~100ms) for extra data to dribble in solves the problem, which seems to be with the 7020 rather than the pico.

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: UART data loss despite large RX buffer

Post by pythoncoder » Wed Jun 15, 2022 9:06 am

Filip_KONAX wrote:
Tue Jun 14, 2022 12:28 pm
...
Considering your stess tests: do they also test the good working of the ringbuffer while doing other things with the PICO?
As the UART RP2 port uses interrupts, serial reception is expected to be flawless while doing other things or am I wrong?
...
My testing was of concurrent sending and receiving between one Pico and another. Received data was checked against expected data and the test was halted if any discrepancy occurred. I also tested concurrent sending from each of the two cores. These tests revealed two rarely occurring concurrency bugs in the firmware which culminated in this PR which has fixed these problems. The test scripts then ran for >100K messages without error.

Of course this doesn't prove the absence of further bugs.

The Pico uses hard ISR's so Python code running concurrently should have no effect on reception.
Peter Hinch
Index to my micropython libraries.

User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: UART data loss despite large RX buffer

Post by Roberthh » Fri Jun 17, 2022 3:01 pm

If I try to run a simplified version of your code on a RP2 pair, having one RP2 to simulate the modem, everything works fine.
Just for interest: Do you have somewhere in the code call this method an unconditional try: / except: pass clause? In that case, it would catch coding and data errors. my reduced test code:

Code: Select all

from machine import UART
from time import ticks_ms, ticks_diff
uart=UART(0, 9600)

def sendCMD_waitSpecificResponse(cmd, timeout=2000, responseList = ["OK","ERROR"]):
    resp = b""
    decodedString = ""
    responseFromListReceived = False

    #purge buffer
    while uart.any() : uart.read(1)

    prvMills = ticks_ms()
    uart.write((cmd + '\r\n').encode())

    while (ticks_diff(ticks_ms(),prvMills) < timeout) and not responseFromListReceived:
        if uart.any():
            resp = b"".join([resp, uart.read()])
            decodedString = resp.decode()
            for expectedResponse in responseList:
                if decodedString.rfind(expectedResponse) > 0:
                    if decodedString.rfind(expectedResponse) < decodedString.rfind("\r\n"):
                        responseFromListReceived = True
                        startExpectedResponse = decodedString.rfind(expectedResponse)
                        endExpectedResponse = decodedString.find("\r\n",startExpectedResponse)
                        return decodedString[startExpectedResponse:endExpectedResponse]
    #only getting here in case of a time-out
    return ""
Works as well at 115200 baud.

Filip_KONAX
Posts: 16
Joined: Wed Jun 08, 2022 1:05 pm

Re: UART data loss despite large RX buffer

Post by Filip_KONAX » Fri Jun 17, 2022 6:18 pm

Hi Robert!

Thank you very much for trying this out. I'm wondering:
A) What data sequences and what responseList you did the testing with?
B) Is data coming at regular intervals or e.g. first a bit, then a pause and then the rest, containing the response you are looking for?

Would you be able to test with this sequence, this is a realistic example of when things go wrong:
PICO 1 sends the data with cmd="AT+CENG?" and responseList=["+CENG:"]

PICO 2 answers with: AT+CENG?\r\r\n quite immediately (as the modem echos the sent commands) and only some time later (eg. 1 second): +CENG: 6151,3,240,"0175C3C9",-101,-12,-88,0,20,"4661",0,,-90\r\n+CENG: 6151,3,289,-102\r\n+CENG: 6151,3,501,-104\r\n+CENG: 6151,3,251,-107\r\n\r\nOK\r\n
This should produce a return value equal to +CENG: 6151,3,240,"0175C3C9",-101,-12,-88,0,20,"4661",0,,-90"
I can try to capture even more realistic timings if this "doesn't do the trick"
Roberthh wrote:
Fri Jun 17, 2022 3:01 pm
If I try to run a simplified version of your code on a RP2 pair, having one RP2 to simulate the modem, everything works fine.
Just for interest: Do you have somewhere in the code call this method an unconditional try: / except: pass clause? In that case, it would catch coding and data errors. my reduced test code:

Code: Select all

from machine import UART
from time import ticks_ms, ticks_diff
uart=UART(0, 9600)

def sendCMD_waitSpecificResponse(cmd, timeout=2000, responseList = ["OK","ERROR"]):
    resp = b""
    decodedString = ""
    responseFromListReceived = False

    #purge buffer
    while uart.any() : uart.read(1)

    prvMills = ticks_ms()
    uart.write((cmd + '\r\n').encode())

    while (ticks_diff(ticks_ms(),prvMills) < timeout) and not responseFromListReceived:
        if uart.any():
            resp = b"".join([resp, uart.read()])
            decodedString = resp.decode()
            for expectedResponse in responseList:
                if decodedString.rfind(expectedResponse) > 0:
                    if decodedString.rfind(expectedResponse) < decodedString.rfind("\r\n"):
                        responseFromListReceived = True
                        startExpectedResponse = decodedString.rfind(expectedResponse)
                        endExpectedResponse = decodedString.find("\r\n",startExpectedResponse)
                        return decodedString[startExpectedResponse:endExpectedResponse]
    #only getting here in case of a time-out
    return ""
Works as well at 115200 baud.

User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: UART data loss despite large RX buffer

Post by Roberthh » Fri Jun 17, 2022 7:02 pm

A) What data sequences and what responseList you did the testing with?
I used the response of your example. As command I accepted any string.
B) Is data coming at regular intervals or e.g. first a bit, then a pause and then the rest, containing the response you are looking for?
The response was sent with uart.write(), so it comes in one chunk.

About your test: It works, and returns the last section of the message: '+CENG: 6151,3,251,-107'
That is OK, because your code is set up this way. If I do a subtle change at the code, changing these to lines;

Code: Select all

        if n := uart.any():
            resp = b"".join([resp, uart.read(n)])
I get the first part of the message response: '+CENG: 6151,3,240,"0175C3C9",-101,-12,-88,0,20,"4661",0,,-90'

The reason: uart.read() reads anything that's present, but waits for a timeout, if more data arrives. uart.read(n) reads exactly n symbols and returns immediately.

Post Reply