A special case in bytes decode function

The official pyboard running MicroPython.
This is the reference design and main target board for MicroPython.
You can buy one at the store.
Target audience: Users with a pyboard.
Post Reply
swanduron
Posts: 10
Joined: Mon Mar 13, 2017 9:03 am

A special case in bytes decode function

Post by swanduron » Sun Nov 05, 2017 8:53 am

I try to convert a bytes to str, but uPy gives some unreasonable result as below, anyone can explain this case?

uPY 1.9.2

Code: Select all

>>> b'\xa5'.decode()
'%'
>>> b'\xa5'.decode() == '%'
False
>>> b'%'.decode() == '%'
True
General PY 3.5.2

Code: Select all

>>> b'\xa5'.decode('utf8')
Traceback (most recent call last):
  File "<pyshell#22>", line 1, in <module>
    b'\xa5'.decode('utf8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 0: invalid start byte

swanduron
Posts: 10
Joined: Mon Mar 13, 2017 9:03 am

Re: A special case in bytes decode function

Post by swanduron » Sun Nov 05, 2017 9:05 am

Sorry for my mistake, my question is, why the b'\xa5' is not equal with char '%'? and, why b'\xa5' can not be decoded by any coding in general Python.

User avatar
deshipu
Posts: 1388
Joined: Thu May 28, 2015 5:54 pm

Re: A special case in bytes decode function

Post by deshipu » Sun Nov 05, 2017 10:03 am

A bytes object is not equal to a string object, because they have different types.

Python3 can't decode bytestring b'\xa5', because that byte signifies a start of a multi-byte sequence in utf8, and should be followed by more bytes specifying what character it actually is. MicroPython doesn't care, because its utf8 support is very rudimentary and hacky, due to size constraints.

Does that make sense?

pfalcon
Posts: 1155
Joined: Fri Feb 28, 2014 2:05 pm

Re: A special case in bytes decode function

Post by pfalcon » Mon Nov 06, 2017 3:26 pm

Code: Select all

$ micropython 
MicroPython v1.9.3-32-g7434a67a5-dirty on 2017-11-05; linux version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> b'\xa5'.decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeError: 
Awesome MicroPython list
Pycopy - A better MicroPython https://github.com/pfalcon/micropython
MicroPython standard library for all ports and forks - https://github.com/pfalcon/micropython-lib
More up to date docs - http://pycopy.readthedocs.io/

User avatar
dhylands
Posts: 3821
Joined: Mon Jan 06, 2014 6:08 pm
Location: Peachland, BC, Canada
Contact:

Re: A special case in bytes decode function

Post by dhylands » Mon Nov 06, 2017 5:05 pm

You get something similar in Python3:

Code: Select all

Python 3.5.3 (default, Sep 24 2017, 15:18:48) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> b'\xa5'.decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 0: invalid start byte
>>> 
Rather than trying to convert b'\xa5' to a character (which you can't since b'\xa5' isn't a valid unicode character) why don't you just compare to b'%' ?

Post Reply