Page 1 of 2
Terminal UTF-8 text corruption
Posted: Sun Jul 11, 2021 10:11 am
by Citylights
Hi
seems that i discovered something that looks like a bug in MicroPython (v1.16.uf2), there's a random corruption
shown in the text printed to the terminal when using UTF-8
Since i'm too new with arm microcontrollers and Micropython i can't do much research on how and why
this happens but i've noticed the corrupted text only appear in the terminal and not if you try to save it
in a file where it appears normal.
there was a discussion going on on another forum and a member could confirm the issue too
by running the same code i'm posting bellow, he is also member here so he can jump into
this thread if he wants to say more about it.
Code: Select all
import machine
import utime
led = machine.Pin(25, machine.Pin.OUT)
sensor_temp = machine.ADC(4)
conversion_factor = 3.3 / (65535)
while True:
reading = sensor_temp.read_u16() * conversion_factor
temperature = 27 - (reading - 0.706)/0.001721
print ("\u0398\u03B5\u03C1\u03BC\u03BF\u03BA\u03C1\u03B1\u03C3\u03AF\u03B1",int(temperature,),end="\r")
led.high()
temp_file = 'temp.txt'
f = open(temp_file, 'a')
b = str(temperature)
f.write("\u0398\u03B5\u03C1\u03BC\u03BF\u03BA\u03C1\u03B1\u03C3\u03AF\u03B1 "+ b + '\n')
f.close()
led.low()
utime.sleep(7)
Re: Terminal UTF-8 text corruption
Posted: Sun Jul 11, 2021 2:19 pm
by Christian Walther
Can you be more specific on what you are doing, what you expect to happen, and what happens instead? And try to give a more minimal reproducing example, I assume the whole temperature measuring stuff is unnecessary and just makes it harder to try for people who don’t have a temperature sensor handy?
I do notice that when I use
screen to look at the serial output and issue
Code: Select all
print("\u0398\u03B5\u03C1\u03BC\u03BF\u03BA\u03C1\u03B1\u03C3\u03AF\u03B1")
it says
?ε?μοκ?α?ία, however that is a problem of
screen (it appears to filter out bytes that it perceives to be control characters) and the solution is to use
screen -U (or
miniterm.py --raw), then it correctly says
Θερμοκρασία for me (on a terminal that is set to UTF-8).
Re: Terminal UTF-8 text corruption
Posted: Sun Jul 11, 2021 2:32 pm
by Citylights
yes it should say "Θερμοκρασία", although as you can see in the photos random characters
pop up in some of the lines and i don't think it's the same case as the one you describe.
sorry that i posted all of my code but i guess it is preferable to have a clear picture of
what i was doing and i've noticed that text corruption.
Re: Terminal UTF-8 text corruption
Posted: Sun Jul 11, 2021 2:51 pm
by jimmo
Can you try using a different terminal? (I'm not really sure what to suggest but some unicode aware Windows terminal?)
Re: Terminal UTF-8 text corruption
Posted: Sun Jul 11, 2021 3:06 pm
by Citylights
i already did, the one is on android phone and the other it's Thonny IDE on a PC.
and if you notice the problem is not on every line of text.
you can pick up that part of my code which prints the text and put it in a loop
so you can see if it happens also on the terminal you use.
Re: Terminal UTF-8 text corruption
Posted: Sun Jul 11, 2021 3:45 pm
by Christian Walther
Okay, that’s odd. I’m not getting anything like that, screen -U is showing dozens of correct Θερμοκρασία. Does it only happen with non-ASCII characters? I guess you’d need to check with an oscilloscope or logic analyzer to be sure what’s happening.
Re: Terminal UTF-8 text corruption
Posted: Sun Jul 11, 2021 3:58 pm
by Citylights
Christian Walther wrote: ↑Sun Jul 11, 2021 3:45 pm
Okay, that’s odd. I’m not getting anything like that,
screen -U is showing dozens of correct Θερμοκρασία. Does it only happen with non-ASCII characters? I guess you’d need to check with an oscilloscope or logic analyzer to be sure what’s happening.
yes there's no corruption using English characters.
oscilloscope or logic analyzer is not available, i thought i should report that issue here so people can know
and search it better if they find it important.
although i wonder what it can be since with english letters everything is fine?
Re: Terminal UTF-8 text corruption
Posted: Sun Jul 11, 2021 9:00 pm
by scruss
Can't reproduce on Raspberry Pi Pico with MicroPython v1.16 (2021-06-18) in Thonny, minicom or screen, sorry.
Here's a link to the thread on the Raspberry Pi forum:
weird character when using UTF-8 - Raspberry Pi Forums
Re: Terminal UTF-8 text corruption
Posted: Mon Jul 12, 2021 4:44 am
by Citylights
scruss wrote: ↑Sun Jul 11, 2021 9:00 pm
Can't reproduce on Raspberry Pi Pico with MicroPython v1.16 (2021-06-18) in Thonny, minicom or screen, sorry.
but the other guy could reproduce the issue as i say in my first post and as it is obvious in the link.
nvm, it's not big deal for me, i thought the devs might be interested.
Re: Terminal UTF-8 text corruption
Posted: Mon Jul 12, 2021 6:49 am
by jimmo
Citylights wrote: ↑Mon Jul 12, 2021 4:44 am
nvm, it's not big deal for me, i thought the devs might be interested.
I've been trying to replicate this here... I changed your code to do a shorter sleep and the \r to a \n (so i can see past messages). Haven't been able to make it happen.
Is there anything else particular about your program? Does it only happen when you write to the file?
I think it's unlikely to be power or USB cable issues..
Is there any chance you can capture what the invalid data is? The suggestion in the rpi forum here:
https://www.raspberrypi.org/forums/view ... 4#p1885091 (The issue you need to install pyserial -- i.e. pip install pyserial).