Page 1 of 6

bits to bytes

Posted: Fri Jun 24, 2022 9:46 pm
by KJM
I've got a dozen contact closures that I've scanned into a 12 bit list. I want to convert them into bytes to send over lora

Code: Select all

>>> import struct; bits=[0,0,0,1,0,0,0,0,1,0,0,0]; bytes=b''
>>> for i in range(0,len(bits),8): den=int("".join(map(str, bits[i:i+8])), 2); bytes+=struct.pack('b',den); print(bytes)
... 
b'\x10'
b'\x10\x08'
works but it feels clunky having to convert to a denary on the way. Is there a better way to do the conversion?

Re: bits to bytes

Posted: Sat Jun 25, 2022 8:06 am
by Christian Walther
What I would do is

Code: Select all

>>> struct.pack('<H', sum(b*(1<<(i^7)) for i, b in enumerate(bits)))
b'\x10\x80'
This is probably a bit more efficient as it involves no string manipulation. The i^7 is to get the first bit into the most-significant place as you did (big-endian), if you can choose the order then using just i for little-endian works as well.

Note that your ordering is a bit inconsistent: you put the first 4 bits into the high nibble of the first byte, the next 4 into the low nibble of the first byte, and the last 4 again into the low nibble of the second byte. My variant consistently alternates between high and low nibble.

Note also that there is no denary involved in your variant. Your den is just an integer, and, on most digital computers, is stored in binary, not in denary – but you don’t need to care about that.

Re: bits to bytes

Posted: Sun Jun 26, 2022 1:27 am
by KJM
Your 1 liner looked super elegant Christian so I ran it forwards & backwards to see if performed as well as as it looks

Code: Select all

>>> bits=[0,0,0,1,0,0,0,0,1,0,0,0]
>>> struct.pack('<H', sum(b*(1<<(i^7)) for i, b in enumerate(bits)))
b'\x10\x80'
>>> struct.pack('<H', sum(b*(1<<(i)) for i, b in enumerate(bits)))
b'\x08\x01'
all good. Then I jumbled the tumblers just to double check

Code: Select all

>> bits=[0,0,1,0,0,0,0,0,1,0,0,0]
>>> struct.pack('<H', sum(b*(1<<(i^7)) for i, b in enumerate(bits)))
b' \x80'
>>> struct.pack('<H', sum(b*(1<<(i)) for i, b in enumerate(bits)))
b'\x04\x01'
But I expected b'\x40\x80' then b'\x04\x01'. This is because bloody python likes throw the occasional ascii character into it's byte representations just to confuse me. Is there a way to force it show b'\x40\x80' in lieu of b' \x80' ?

Re: bits to bytes

Posted: Sun Jun 26, 2022 8:19 am
by Christian Walther
KJM wrote:
Sun Jun 26, 2022 1:27 am
Is there a way to force it show b'\x40\x80' in lieu of b' \x80' ?
No, but you can use

Code: Select all

>>> import binascii
>>> binascii.hexlify(b' \x80', ' ')
b'20 80'

Re: bits to bytes

Posted: Sun Jun 26, 2022 11:22 pm
by KJM
So no easy way to get b'40 80' ? Seems to me python is being as slippery with octal/hex here as it sometimes is with types?

Re: bits to bytes

Posted: Mon Jun 27, 2022 2:16 am
by jimmo
KJM wrote:
Sun Jun 26, 2022 11:22 pm
So no easy way to get b'40 80' ? Seems to me python is being as slippery with octal/hex here as it sometimes is with types?
Python's rules for printing bytes (or bytearrays) is to show the character if it's printable, otherwise the escape sequence. The actual data is correct though, and if you send it over the network (e.g. LoRa) then it will be fine.

Unfortunately there's no way to turn off this behavior, but hexlify is the best option for printing it out in hexadecimal (and it's clearer than having it with all the \x in the middle).

Re: bits to bytes

Posted: Mon Jun 27, 2022 3:10 am
by KJM
My head says you're right jimmo & it will be fine over lora but heart is saying 'you can't trust it, use chr/ord (where wysiwyg) instead'

Re: bits to bytes

Posted: Mon Jun 27, 2022 5:42 am
by jimmo
KJM wrote:
Mon Jun 27, 2022 3:10 am
My head says you're right jimmo & it will be fine over lora but heart is saying 'you can't trust it, use chr/ord (where wysiwyg) instead'
Maybe this will help you.

Code: Select all

>>> b=bytes((0x08, 0x50, 0xf0,))
>>> print(b)
b'\x08P\xf0'
>>> import binascii
>>> print(f'0x{binascii.hexlify(b).decode()}')
0x0850f0
>>> print(list(i for i in b))
[8, 80, 240]
>>> print(list(hex(i) for i in b))
['0x8', '0x50', '0xf0']
So we have three bytes, the second of which is printable. Printing them will escape the first and last.

If i want to see their actual hex values, I can either use hexlify to see them as two-character hex (the way I did it above is my preferred way of seeing this, but also using : as the separator is convenient. Or I can print their actual numerical values (first as decimal, second as hexadecimal strings).

There's nothing funny going on, bytes are just bytes, the only perhaps surprising thing is that printing a bytearray will not escape printable characters.

(One annoying thing in Python is that hexlify returns a bytes, if the input is bytes, hence why we call decode to convert it to str. Unfortunate historical accident in Python. They fixed this when they added bytes.hex (but unfortunately we do not support that in MicroPython... see https://github.com/micropython/micropython/pull/7539).

Re: bits to bytes

Posted: Tue Jun 28, 2022 6:00 am
by KJM
thnx. When I google 'escape' I got a lot of stuff about 'escape characters' but I don't think that's what you're referring too? Can you explain 'escape' in this context?

Re: bits to bytes

Posted: Tue Jun 28, 2022 3:08 pm
by jimmo
KJM wrote:
Tue Jun 28, 2022 6:00 am
Can you explain 'escape' in this context?
Yes I mean escape characters -- in Python (like C, Java, etc), the backslash is the escape character. It indicates the start of a sequence to let you tell the compiler exactly what character you want -- useful if that character isn't printable. So for example \n for newline (which is the byte value 10 in decimal, or 0x0a in hex).

In Python, \xXX lets you specify the hexadecimal byte value. "\x0a" is equivalent to "\n". Or like in the previous example, "\x50" is the same as "P".

This is why when you print a string containing an unprintable character, it turns it into this format as a convenience -- you can copy and paste it in as a string literal and the compiler will understand it.

(Other languages have a different escape character... for example, %20 in a URL means hex 20, which is space.)