A special case in bytes decode function

swanduron · Post by **swanduron** » Sun Nov 05, 2017 8:53 am

I try to convert a bytes to str, but uPy gives some unreasonable result as below, anyone can explain this case?

uPY 1.9.2

Code: Select all

>>> b'\xa5'.decode()
'%'
>>> b'\xa5'.decode() == '%'
False
>>> b'%'.decode() == '%'
True

General PY 3.5.2

Code: Select all

>>> b'\xa5'.decode('utf8')
Traceback (most recent call last):
  File "<pyshell#22>", line 1, in <module>
    b'\xa5'.decode('utf8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 0: invalid start byte

swanduron · Post by **swanduron** » Sun Nov 05, 2017 9:05 am

Sorry for my mistake, my question is, why the b'\xa5' is not equal with char '%'? and, why b'\xa5' can not be decoded by any coding in general Python.

deshipu · Post by **deshipu** » Sun Nov 05, 2017 10:03 am

A bytes object is not equal to a string object, because they have different types.

Python3 can't decode bytestring b'\xa5', because that byte signifies a start of a multi-byte sequence in utf8, and should be followed by more bytes specifying what character it actually is. MicroPython doesn't care, because its utf8 support is very rudimentary and hacky, due to size constraints.

Does that make sense?

pfalcon · Post by **pfalcon** » Mon Nov 06, 2017 3:26 pm

Code: Select all

$ micropython 
MicroPython v1.9.3-32-g7434a67a5-dirty on 2017-11-05; linux version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> b'\xa5'.decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeError:

dhylands · Post by **dhylands** » Mon Nov 06, 2017 5:05 pm

You get something similar in Python3:

Code: Select all

Python 3.5.3 (default, Sep 24 2017, 15:18:48) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> b'\xa5'.decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 0: invalid start byte
>>>

Rather than trying to convert b'\xa5' to a character (which you can't since b'\xa5' isn't a valid unicode character) why don't you just compare to b'%' ?

MicroPython Forum (Archive)

A special case in bytes decode function

A special case in bytes decode function

Re: A special case in bytes decode function

Re: A special case in bytes decode function

Re: A special case in bytes decode function

Re: A special case in bytes decode function