Terminal UTF-8 text corruption

C programming, build, interpreter/VM.
Target audience: MicroPython Developers.
Citylights
Posts: 13
Joined: Sat Jul 10, 2021 7:31 pm

Terminal UTF-8 text corruption

Post by Citylights » Sun Jul 11, 2021 10:11 am

Hi

seems that i discovered something that looks like a bug in MicroPython (v1.16.uf2), there's a random corruption
shown in the text printed to the terminal when using UTF-8

Since i'm too new with arm microcontrollers and Micropython i can't do much research on how and why
this happens but i've noticed the corrupted text only appear in the terminal and not if you try to save it
in a file where it appears normal.

there was a discussion going on on another forum and a member could confirm the issue too
by running the same code i'm posting bellow, he is also member here so he can jump into
this thread if he wants to say more about it.

Code: Select all

import machine
import utime

led = machine.Pin(25, machine.Pin.OUT)
sensor_temp = machine.ADC(4)
conversion_factor = 3.3 / (65535)
 
while True:
    
    reading = sensor_temp.read_u16() * conversion_factor 
    temperature = 27 - (reading - 0.706)/0.001721
    print ("\u0398\u03B5\u03C1\u03BC\u03BF\u03BA\u03C1\u03B1\u03C3\u03AF\u03B1",int(temperature,),end="\r")
    led.high()
    temp_file = 'temp.txt'
    f = open(temp_file, 'a')
    b = str(temperature)
    f.write("\u0398\u03B5\u03C1\u03BC\u03BF\u03BA\u03C1\u03B1\u03C3\u03AF\u03B1 "+ b + '\n')
    f.close()
    led.low()    
    utime.sleep(7)

Christian Walther
Posts: 169
Joined: Fri Aug 19, 2016 11:55 am

Re: Terminal UTF-8 text corruption

Post by Christian Walther » Sun Jul 11, 2021 2:19 pm

Can you be more specific on what you are doing, what you expect to happen, and what happens instead? And try to give a more minimal reproducing example, I assume the whole temperature measuring stuff is unnecessary and just makes it harder to try for people who don’t have a temperature sensor handy?

I do notice that when I use screen to look at the serial output and issue

Code: Select all

print("\u0398\u03B5\u03C1\u03BC\u03BF\u03BA\u03C1\u03B1\u03C3\u03AF\u03B1")
it says ?ε?μοκ?α?ία, however that is a problem of screen (it appears to filter out bytes that it perceives to be control characters) and the solution is to use screen -U (or miniterm.py --raw), then it correctly says Θερμοκρασία for me (on a terminal that is set to UTF-8).

Citylights
Posts: 13
Joined: Sat Jul 10, 2021 7:31 pm

Re: Terminal UTF-8 text corruption

Post by Citylights » Sun Jul 11, 2021 2:32 pm

yes it should say "Θερμοκρασία", although as you can see in the photos random characters
pop up in some of the lines and i don't think it's the same case as the one you describe.

sorry that i posted all of my code but i guess it is preferable to have a clear picture of
what i was doing and i've noticed that text corruption.
Attachments
thonny.png
thonny.png (133.86 KiB) Viewed 4272 times
terminal.jpg
terminal.jpg (87.13 KiB) Viewed 4272 times

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: Terminal UTF-8 text corruption

Post by jimmo » Sun Jul 11, 2021 2:51 pm

Can you try using a different terminal? (I'm not really sure what to suggest but some unicode aware Windows terminal?)

Citylights
Posts: 13
Joined: Sat Jul 10, 2021 7:31 pm

Re: Terminal UTF-8 text corruption

Post by Citylights » Sun Jul 11, 2021 3:06 pm

i already did, the one is on android phone and the other it's Thonny IDE on a PC.
and if you notice the problem is not on every line of text.

you can pick up that part of my code which prints the text and put it in a loop
so you can see if it happens also on the terminal you use.

Christian Walther
Posts: 169
Joined: Fri Aug 19, 2016 11:55 am

Re: Terminal UTF-8 text corruption

Post by Christian Walther » Sun Jul 11, 2021 3:45 pm

Okay, that’s odd. I’m not getting anything like that, screen -U is showing dozens of correct Θερμοκρασία. Does it only happen with non-ASCII characters? I guess you’d need to check with an oscilloscope or logic analyzer to be sure what’s happening.

Citylights
Posts: 13
Joined: Sat Jul 10, 2021 7:31 pm

Re: Terminal UTF-8 text corruption

Post by Citylights » Sun Jul 11, 2021 3:58 pm

Christian Walther wrote:
Sun Jul 11, 2021 3:45 pm
Okay, that’s odd. I’m not getting anything like that, screen -U is showing dozens of correct Θερμοκρασία. Does it only happen with non-ASCII characters? I guess you’d need to check with an oscilloscope or logic analyzer to be sure what’s happening.
yes there's no corruption using English characters.

oscilloscope or logic analyzer is not available, i thought i should report that issue here so people can know
and search it better if they find it important.


although i wonder what it can be since with english letters everything is fine?

User avatar
scruss
Posts: 360
Joined: Sat Aug 12, 2017 2:27 pm
Location: Toronto, Canada
Contact:

Re: Terminal UTF-8 text corruption

Post by scruss » Sun Jul 11, 2021 9:00 pm

Can't reproduce on Raspberry Pi Pico with MicroPython v1.16 (2021-06-18) in Thonny, minicom or screen, sorry.

Here's a link to the thread on the Raspberry Pi forum: weird character when using UTF-8 - Raspberry Pi Forums

Citylights
Posts: 13
Joined: Sat Jul 10, 2021 7:31 pm

Re: Terminal UTF-8 text corruption

Post by Citylights » Mon Jul 12, 2021 4:44 am

scruss wrote:
Sun Jul 11, 2021 9:00 pm
Can't reproduce on Raspberry Pi Pico with MicroPython v1.16 (2021-06-18) in Thonny, minicom or screen, sorry.
but the other guy could reproduce the issue as i say in my first post and as it is obvious in the link.

nvm, it's not big deal for me, i thought the devs might be interested.

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: Terminal UTF-8 text corruption

Post by jimmo » Mon Jul 12, 2021 6:49 am

Citylights wrote:
Mon Jul 12, 2021 4:44 am
nvm, it's not big deal for me, i thought the devs might be interested.
I've been trying to replicate this here... I changed your code to do a shorter sleep and the \r to a \n (so i can see past messages). Haven't been able to make it happen.

Is there anything else particular about your program? Does it only happen when you write to the file?

I think it's unlikely to be power or USB cable issues..

Is there any chance you can capture what the invalid data is? The suggestion in the rpi forum here: https://www.raspberrypi.org/forums/view ... 4#p1885091 (The issue you need to install pyserial -- i.e. pip install pyserial).

Post Reply