UART data loss despite large RX buffer
Posted: Tue Jun 14, 2022 10:26 am
Hi all
I'm struggling with the internals of the UART IRQ ringbuffer mechanism (I guess)
In my project; the RPi PICO is connected to a Waveshare Pico-SIM7020X-NB-IoT (modem for NB-IoT) through serial port 0.
Unfortunately this hardware setup does not offer hardware flow control.
Communication between modem and PICO is through AT commands, quite standard.
I created the UART instance with "large" buffers inside my communication class:
I've been digging in the machine_uart.c of the RP2 port (MicroPython v1.18) and have found that the UART is using the interrupt mechanism to fill a ringbuffer. Despite this IRQ based data receiving, I (seem to?) loose data when I gradually process the incoming data.
I have two different methods for reading data: a 'dumb' one that just waits a long time and a second (more intelligent?) one that should only wait until a certain expected response has been received (+\r\n). This method also sends the AT command first to the modem. Because sometimes more data is sent by the modem than what we need, the remainder of the data remains in the rxbuffer (that's why I purge the read buffer before I proceed to sending the 'next' AT command).
Method one ("dumb"):
Method two ("smart"):
Just to give complete info: the "safe_decode" is here (not to crash the program when strange, non-decodable byte sequences are read)
My second method is doing very well for short messages. Instead of waiting a 'safe' time (e.g. 200ms) on the response of an AT command (e.g. some modem setting), I only wait for an "OK" or "ERROR" to be received. That give me a huge time gain; as these short answers come in quickly most of the time. As modem response delays are not predictable, I can set the time-out on a safe 2000ms without always having to wait that long...
BUT... when I use my smart method on long messages, things go wrong: I'm not able to receive all data, the method exits with a time-out. Debugging showed me that at a certain moment, no more data is available (uart.any() = 0), but I'm sure I didn't receive all.
I guess there is a maybe a FIFO buffer overrun in the PICO hardware, but I presumed that because the reading of the PICO UART FIFO is done by the uart_service_interrupt routine, I wouldn't loose data...
Example data received just waiting 10 seconds with the "dumb" method:
AT+CENG?\r\r\n+CENG: 6151,3,240,"0175C3C9",-101,-12,-88,0,20,"4661",0,,-90\r\n+CENG: 6151,3,289,-102\r\n+CENG: 6151,3,501,-104\r\n+CENG: 6151,3,251,-107\r\n\r\nOK\r\n
Using method two, I only get to receive:
AT+CENG?\r\r\n+CENG: 6151,3,273,"0175A8C9",-
I find this very annoying, because I made this second method mainly for gaining time with the AT commands that produce longer answers. As they tend to take a while before the first character arrives, I now need to put in very long time-outs to be sure to have received all... Receiving all means (mostly) that the answer ends with an "OK" or "ERROR", so I guess the "smart" method was likely to be the good solution for my problem. But as stated above: I tend not to receive all when checking whether I received all...
Am I missing something
? Advice, hints or solution very much appreciated
!
I'm struggling with the internals of the UART IRQ ringbuffer mechanism (I guess)
In my project; the RPi PICO is connected to a Waveshare Pico-SIM7020X-NB-IoT (modem for NB-IoT) through serial port 0.
Unfortunately this hardware setup does not offer hardware flow control.
Communication between modem and PICO is through AT commands, quite standard.
I created the UART instance with "large" buffers inside my communication class:
Code: Select all
self.uart = machine.UART(0, 115200, tx=Pin(0), rx=Pin(1), bits=8, parity=None, stop=1, rxbuf=2000, txbuf=1000)
I have two different methods for reading data: a 'dumb' one that just waits a long time and a second (more intelligent?) one that should only wait until a certain expected response has been received (+\r\n). This method also sends the AT command first to the modem. Because sometimes more data is sent by the modem than what we need, the remainder of the data remains in the rxbuffer (that's why I purge the read buffer before I proceed to sending the 'next' AT command).
Method one ("dumb"):
Code: Select all
def waitResp(self, timeout=2000):
prvMills = ticks_ms()
resp = b""
while ticks_diff(ticks_ms(),prvMills) < timeout:
if self.uart.any():
resp = b"".join([resp, self.uart.read(1)])
return self.safe_decode(resp,"waitResp")
Code: Select all
def sendCMD_waitSpecificResponse(self, cmd, timeout=2000, responseList = ["OK","ERROR"]):
resp = b""
decodedString = ""
responseFromListReceived = False
#purge buffer
while self.uart.any() : self.uart.read(1)
prvMills = ticks_ms()
self.uart.write((cmd + '\r\n').encode()))
while (ticks_diff(ticks_ms(),prvMills) < timeout) and not responseFromListReceived:
if self.uart.any():
resp = b"".join([resp, self.uart.read()])
decodedString = self.safe_decode(resp,"sendCMD_waitSpecificResponse")
for expectedResponse in responseList:
if decodedString.rfind(expectedResponse) > 0:
if decodedString.rfind(expectedResponse) < decodedString.rfind("\r\n"):
responseFromListReceived = True
startExpectedResponse = decodedString.rfind(expectedResponse)
endExpectedResponse = decodedString.find("\r\n",startExpectedResponse)
return decodedString[startExpectedResponse:endExpectedResponse]
#only getting here in case of a time-out
return ""
Code: Select all
def safe_decode(self,instr, call_reference):
try:
return instr.decode()
except Exception as e:
self.log.error("Exception while decoding binary string \"" + repr(e) + " in : " + call_reference)
return ""
BUT... when I use my smart method on long messages, things go wrong: I'm not able to receive all data, the method exits with a time-out. Debugging showed me that at a certain moment, no more data is available (uart.any() = 0), but I'm sure I didn't receive all.
I guess there is a maybe a FIFO buffer overrun in the PICO hardware, but I presumed that because the reading of the PICO UART FIFO is done by the uart_service_interrupt routine, I wouldn't loose data...
Example data received just waiting 10 seconds with the "dumb" method:
AT+CENG?\r\r\n+CENG: 6151,3,240,"0175C3C9",-101,-12,-88,0,20,"4661",0,,-90\r\n+CENG: 6151,3,289,-102\r\n+CENG: 6151,3,501,-104\r\n+CENG: 6151,3,251,-107\r\n\r\nOK\r\n
Using method two, I only get to receive:
AT+CENG?\r\r\n+CENG: 6151,3,273,"0175A8C9",-
I find this very annoying, because I made this second method mainly for gaining time with the AT commands that produce longer answers. As they tend to take a while before the first character arrives, I now need to put in very long time-outs to be sure to have received all... Receiving all means (mostly) that the answer ends with an "OK" or "ERROR", so I guess the "smart" method was likely to be the good solution for my problem. But as stated above: I tend not to receive all when checking whether I received all...
Am I missing something

