bits to bytes

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
KJM
Posts: 158
Joined: Sun Nov 18, 2018 10:53 pm
Location: Sydney AU

bits to bytes

Post by KJM » Fri Jun 24, 2022 9:46 pm

I've got a dozen contact closures that I've scanned into a 12 bit list. I want to convert them into bytes to send over lora

Code: Select all

>>> import struct; bits=[0,0,0,1,0,0,0,0,1,0,0,0]; bytes=b''
>>> for i in range(0,len(bits),8): den=int("".join(map(str, bits[i:i+8])), 2); bytes+=struct.pack('b',den); print(bytes)
... 
b'\x10'
b'\x10\x08'
works but it feels clunky having to convert to a denary on the way. Is there a better way to do the conversion?

Christian Walther
Posts: 169
Joined: Fri Aug 19, 2016 11:55 am

Re: bits to bytes

Post by Christian Walther » Sat Jun 25, 2022 8:06 am

What I would do is

Code: Select all

>>> struct.pack('<H', sum(b*(1<<(i^7)) for i, b in enumerate(bits)))
b'\x10\x80'
This is probably a bit more efficient as it involves no string manipulation. The i^7 is to get the first bit into the most-significant place as you did (big-endian), if you can choose the order then using just i for little-endian works as well.

Note that your ordering is a bit inconsistent: you put the first 4 bits into the high nibble of the first byte, the next 4 into the low nibble of the first byte, and the last 4 again into the low nibble of the second byte. My variant consistently alternates between high and low nibble.

Note also that there is no denary involved in your variant. Your den is just an integer, and, on most digital computers, is stored in binary, not in denary – but you don’t need to care about that.

KJM
Posts: 158
Joined: Sun Nov 18, 2018 10:53 pm
Location: Sydney AU

Re: bits to bytes

Post by KJM » Sun Jun 26, 2022 1:27 am

Your 1 liner looked super elegant Christian so I ran it forwards & backwards to see if performed as well as as it looks

Code: Select all

>>> bits=[0,0,0,1,0,0,0,0,1,0,0,0]
>>> struct.pack('<H', sum(b*(1<<(i^7)) for i, b in enumerate(bits)))
b'\x10\x80'
>>> struct.pack('<H', sum(b*(1<<(i)) for i, b in enumerate(bits)))
b'\x08\x01'
all good. Then I jumbled the tumblers just to double check

Code: Select all

>> bits=[0,0,1,0,0,0,0,0,1,0,0,0]
>>> struct.pack('<H', sum(b*(1<<(i^7)) for i, b in enumerate(bits)))
b' \x80'
>>> struct.pack('<H', sum(b*(1<<(i)) for i, b in enumerate(bits)))
b'\x04\x01'
But I expected b'\x40\x80' then b'\x04\x01'. This is because bloody python likes throw the occasional ascii character into it's byte representations just to confuse me. Is there a way to force it show b'\x40\x80' in lieu of b' \x80' ?

Christian Walther
Posts: 169
Joined: Fri Aug 19, 2016 11:55 am

Re: bits to bytes

Post by Christian Walther » Sun Jun 26, 2022 8:19 am

KJM wrote:
Sun Jun 26, 2022 1:27 am
Is there a way to force it show b'\x40\x80' in lieu of b' \x80' ?
No, but you can use

Code: Select all

>>> import binascii
>>> binascii.hexlify(b' \x80', ' ')
b'20 80'

KJM
Posts: 158
Joined: Sun Nov 18, 2018 10:53 pm
Location: Sydney AU

Re: bits to bytes

Post by KJM » Sun Jun 26, 2022 11:22 pm

So no easy way to get b'40 80' ? Seems to me python is being as slippery with octal/hex here as it sometimes is with types?

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: bits to bytes

Post by jimmo » Mon Jun 27, 2022 2:16 am

KJM wrote:
Sun Jun 26, 2022 11:22 pm
So no easy way to get b'40 80' ? Seems to me python is being as slippery with octal/hex here as it sometimes is with types?
Python's rules for printing bytes (or bytearrays) is to show the character if it's printable, otherwise the escape sequence. The actual data is correct though, and if you send it over the network (e.g. LoRa) then it will be fine.

Unfortunately there's no way to turn off this behavior, but hexlify is the best option for printing it out in hexadecimal (and it's clearer than having it with all the \x in the middle).

KJM
Posts: 158
Joined: Sun Nov 18, 2018 10:53 pm
Location: Sydney AU

Re: bits to bytes

Post by KJM » Mon Jun 27, 2022 3:10 am

My head says you're right jimmo & it will be fine over lora but heart is saying 'you can't trust it, use chr/ord (where wysiwyg) instead'

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: bits to bytes

Post by jimmo » Mon Jun 27, 2022 5:42 am

KJM wrote:
Mon Jun 27, 2022 3:10 am
My head says you're right jimmo & it will be fine over lora but heart is saying 'you can't trust it, use chr/ord (where wysiwyg) instead'
Maybe this will help you.

Code: Select all

>>> b=bytes((0x08, 0x50, 0xf0,))
>>> print(b)
b'\x08P\xf0'
>>> import binascii
>>> print(f'0x{binascii.hexlify(b).decode()}')
0x0850f0
>>> print(list(i for i in b))
[8, 80, 240]
>>> print(list(hex(i) for i in b))
['0x8', '0x50', '0xf0']
So we have three bytes, the second of which is printable. Printing them will escape the first and last.

If i want to see their actual hex values, I can either use hexlify to see them as two-character hex (the way I did it above is my preferred way of seeing this, but also using : as the separator is convenient. Or I can print their actual numerical values (first as decimal, second as hexadecimal strings).

There's nothing funny going on, bytes are just bytes, the only perhaps surprising thing is that printing a bytearray will not escape printable characters.

(One annoying thing in Python is that hexlify returns a bytes, if the input is bytes, hence why we call decode to convert it to str. Unfortunate historical accident in Python. They fixed this when they added bytes.hex (but unfortunately we do not support that in MicroPython... see https://github.com/micropython/micropython/pull/7539).

KJM
Posts: 158
Joined: Sun Nov 18, 2018 10:53 pm
Location: Sydney AU

Re: bits to bytes

Post by KJM » Tue Jun 28, 2022 6:00 am

thnx. When I google 'escape' I got a lot of stuff about 'escape characters' but I don't think that's what you're referring too? Can you explain 'escape' in this context?

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: bits to bytes

Post by jimmo » Tue Jun 28, 2022 3:08 pm

KJM wrote:
Tue Jun 28, 2022 6:00 am
Can you explain 'escape' in this context?
Yes I mean escape characters -- in Python (like C, Java, etc), the backslash is the escape character. It indicates the start of a sequence to let you tell the compiler exactly what character you want -- useful if that character isn't printable. So for example \n for newline (which is the byte value 10 in decimal, or 0x0a in hex).

In Python, \xXX lets you specify the hexadecimal byte value. "\x0a" is equivalent to "\n". Or like in the previous example, "\x50" is the same as "P".

This is why when you print a string containing an unprintable character, it turns it into this format as a convenience -- you can copy and paste it in as a string literal and the compiler will understand it.

(Other languages have a different escape character... for example, %20 in a URL means hex 20, which is space.)

Post Reply