Page 1 of 1

A special case in bytes decode function

Posted: Sun Nov 05, 2017 8:53 am
by swanduron
I try to convert a bytes to str, but uPy gives some unreasonable result as below, anyone can explain this case?

uPY 1.9.2

Code: Select all

>>> b'\xa5'.decode()
'%'
>>> b'\xa5'.decode() == '%'
False
>>> b'%'.decode() == '%'
True
General PY 3.5.2

Code: Select all

>>> b'\xa5'.decode('utf8')
Traceback (most recent call last):
  File "<pyshell#22>", line 1, in <module>
    b'\xa5'.decode('utf8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 0: invalid start byte

Re: A special case in bytes decode function

Posted: Sun Nov 05, 2017 9:05 am
by swanduron
Sorry for my mistake, my question is, why the b'\xa5' is not equal with char '%'? and, why b'\xa5' can not be decoded by any coding in general Python.

Re: A special case in bytes decode function

Posted: Sun Nov 05, 2017 10:03 am
by deshipu
A bytes object is not equal to a string object, because they have different types.

Python3 can't decode bytestring b'\xa5', because that byte signifies a start of a multi-byte sequence in utf8, and should be followed by more bytes specifying what character it actually is. MicroPython doesn't care, because its utf8 support is very rudimentary and hacky, due to size constraints.

Does that make sense?

Re: A special case in bytes decode function

Posted: Mon Nov 06, 2017 3:26 pm
by pfalcon

Code: Select all

$ micropython 
MicroPython v1.9.3-32-g7434a67a5-dirty on 2017-11-05; linux version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> b'\xa5'.decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeError: 

Re: A special case in bytes decode function

Posted: Mon Nov 06, 2017 5:05 pm
by dhylands
You get something similar in Python3:

Code: Select all

Python 3.5.3 (default, Sep 24 2017, 15:18:48) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> b'\xa5'.decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 0: invalid start byte
>>> 
Rather than trying to convert b'\xa5' to a character (which you can't since b'\xa5' isn't a valid unicode character) why don't you just compare to b'%' ?