bits to bytes

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
martincho
Posts: 96
Joined: Mon May 16, 2022 9:59 pm

Re: bits to bytes

Post by martincho » Thu Jul 07, 2022 4:18 pm

KJM wrote:
Thu Jul 07, 2022 9:13 am
Something is troubling me Rob. It should take 2 bytes to represent 16 bits. But if I convert my 16 bit string of 1s & 0s to a character with d=int(bis, 2); bys=hex(d); r=chr(int(bys)) & send that over raw lora (without encryption) the receiver says it's received 1 byte. How is it possible to cram 65526 combinations into a single byte? I mean there could be 65536 unique characters but I can't understand how they can all be sent as variations of a single byte? To my mind a single byte can only handle 8 bits not 16?
OK, here's a simple test function that runs through every possible 16 bit number, converts to various forms and checks for errors. Not sure this will help, but maybe it will clarify a few things. You can choose to convert to binary string representation as part of the conversion chain, or not. Just provide "True" as the argument if you want to test with binary conversion.

The objective wasn't to showcase the most optimal form for the various conversions, but rather to show a series of potentially relevant conversions from integer through other forms of representation and back to integer and do so without errors.

Code: Select all

# Generate every possible 16 bit number
# Convert it to two 8 bit numbers
# Convert it to bytes data type
# Convert back to a 16 bit number and flag if they do not match
#
def int16_to_bytes_test(convert_to_binary_first=False):
    for n in range(0,2**16):
        n = n & 0xFFFF  # Unnecessary. Just being paranoid to ensure we only have 16 bits

        if convert_to_binary_first:
            # To a 16 bit binary string representation
            n_in_binary = f"{n:016b}"
            # Convert into a tuple of two 8 bit numbers
            # Doing it the hard way
            broken_into_two_8_bit_numbers = int(n_in_binary[:8], 2), int(n_in_binary[8:], 2)
        else:
            # 16 bit to a tuple with two integers limited to a 0 to 256 range (8 bits)
            broken_into_two_8_bit_numbers = int(n / 256), n % 256

        # Convert to bytes
        bytes_from_tuple = bytes(broken_into_two_8_bit_numbers)

        # Convert back to 16 bit integer and compare
        from_bytes_to_16_bit = bytes_from_tuple[0] * 256 + bytes_from_tuple[1]

        # DEBUG ***** uncomment to see all data as generated
        # print(f"n:{n:5d}   two_8: {broken_into_two_8_bit_numbers}   bytes: {str(bytes_from_tuple):>12}   to_16:{from_bytes_to_16_bit:5d}")

        # Compare and flag if they DO NOT match
        if from_bytes_to_16_bit != n:
            print(f"Error: {n} does not match the converted {from_bytes_to_16_bit}")
            return

    print("Every number passed the test")


int16_to_bytes_test(True)

KJM
Posts: 158
Joined: Sun Nov 18, 2018 10:53 pm
Location: Sydney AU

Re: bits to bytes

Post by KJM » Fri Jul 08, 2022 6:58 am

tnx for that, I see the error of my ways with trying to use chr, I'll swap to struct. One thing is still puzzling me though. When I was dabbling with chr I'd get

Code: Select all

bis='1111011111111010' 
d=int(bis, 2)
h=hex(d)
r=chr(int(h))
print(r, type(r), len(r), repr(r), type(repr(r)), len(repr(r)))

 <class 'str'> 1 '\uf7fa' <class 'str'> 8
But when I send this character the lora the receiver reports 3 bytes. Am I right in thinking that those 3 bytes were probably \u f7 fa ? What's bugging me is I can't get anything other than the lora receiver to report 3 bytes for r. I figure if I really had a handle on the different classes and their various representations I should be able to get python to report  or \uf7fa as 3 bytes, but I can't, only the lora receiver sees the length as 3. Any ideas or have I worn out my welcome?

KJM
Posts: 158
Joined: Sun Nov 18, 2018 10:53 pm
Location: Sydney AU

Re: bits to bytes

Post by KJM » Fri Jul 08, 2022 8:09 am

You're not paranoid if they really are out to get you! I think the python God's have got it in for me. Checking struct.unpack at the cmd prompt this avo

Code: Select all

>>> bis='11110'
>>> struct.pack('<H', sum(int(bis)*(1<<(i^7)) for i, b in enumerate(bis)))
b'\xd0\n'
I mean WTF!
So I reset the micro & try gain

Code: Select all

>>> bis='11110'
>>> struct.pack('<H', sum(int(bis)*(1<<(i^7)) for i, b in enumerate(bis)))
b'\xf0\x00'
If this keeps up I'm gonna be OCD, double & trippple checking everything. Am I missing something, am I supposed to do a gc between struct.packs?

TheSilverBullet
Posts: 50
Joined: Thu Jul 07, 2022 7:40 am

Re: bits to bytes

Post by TheSilverBullet » Fri Jul 08, 2022 8:39 am

…What… are you trying to achieve?
'11110' is a string representation of 0b11110, which is the same as 30(dec), which is the same as 0x1e.
So if you have a string of 1's and 0's, you can convert that to decimal using:
>>> bis = '11110'
>>> int(bis, 2) # from binary to decimal
30
If it's more than 255 then you need multiple bytes.

>>> bis = '1111000010100101'
>>> n = int(bis, 2)
>>> print(n)
61605
That's two bytes so if you need something supporting the buffer protocol, use a bytearray.
>>> ba = bytearray((n >> 8, n & 0xff))
>>> print(ba, ba[0], ba[1])
bytearray(b'\xf0\xa5') 240 165

That bytearray is two bytes long and supports the buffer protocol.

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: bits to bytes

Post by jimmo » Fri Jul 08, 2022 12:47 pm

KJM wrote:
Fri Jul 08, 2022 8:09 am
You're not paranoid if they really are out to get you! I think the python God's have got it in for me. Checking struct.unpack at the cmd prompt this avo

Code: Select all

>>> bis='11110'
>>> struct.pack('<H', sum(int(bis)*(1<<(i^7)) for i, b in enumerate(bis)))
b'\xd0\n'
I mean WTF!
So I reset the micro & try gain

Code: Select all

>>> bis='11110'
>>> struct.pack('<H', sum(int(bis)*(1<<(i^7)) for i, b in enumerate(bis)))
b'\xf0\x00'
If this keeps up I'm gonna be OCD, double & trippple checking everything. Am I missing something, am I supposed to do a gc between struct.packs?
The code is wrong. Where you have written `int(bis)` it should be `int(b)`. Please refer back to the second comment on this thread (from Christian) -- viewtopic.php?f=2&t=12597#p68301

User avatar
karfas
Posts: 193
Joined: Sat Jan 16, 2021 12:53 pm
Location: Vienna, Austria

Re: bits to bytes

Post by karfas » Fri Jul 08, 2022 1:05 pm

KJM wrote:
Fri Jul 08, 2022 8:09 am
>>> struct.pack('<H', sum(int(bis)*(1<<(i^7)) for i, b in enumerate(bis)))
b'\xf0\x00'[/code]
For me, this is way too complicated.
As you can get the integer representation of your bits for cheap via int(bis, 2) you are already fine to pack this.
I don't see any value in looping through the string.

Code: Select all

>>> bis='11110'
>>> value=int(bis,2)
>>> hex(value)
'0x1e'
>>> packed=struct.pack('<H', value)
>>> packed[0]
'\x1e'
>>> packed[1]
'\x00'
A few hours of debugging might save you from minutes of reading the documentation! :D
My repositories: https://github.com/karfas

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: bits to bytes

Post by jimmo » Fri Jul 08, 2022 2:27 pm

karfas wrote:
Fri Jul 08, 2022 1:05 pm
For me, this is way too complicated.
As you can get the integer representation of your bits for cheap via int(bis, 2) you are already fine to pack this.
I don't see any value in looping through the string.
Yes, exactly.

KJM: The original question you asked had an "array of integers" and this approach was a way to solve that. When you changed it to a string of "1" and "0", the correct replacement is just int(bis, 2).

In summary:

Code: Select all

bits_as_string = '11110'
value=int(bits_as_string, 2)
OR:

Code: Select all

bits_as_list = [1,1,1,1,0]
value = sum(b*(1<<(i^7)) for i, b in enumerate(bits_as_list))
BUT. We're going around in circles. If your problem is "take 16 open/closed sensors and turn it into a two byte value", then just use the code I wrote here: viewtopic.php?f=2&t=12597&start=20#p68627

martincho
Posts: 96
Joined: Mon May 16, 2022 9:59 pm

Re: bits to bytes

Post by martincho » Fri Jul 08, 2022 2:52 pm

KJM wrote:
Fri Jul 08, 2022 6:58 am
But when I send this character the lora the receiver reports 3 bytes. Am I right in thinking that those 3 bytes were probably \u f7 fa ? What's bugging me is I can't get anything other than the lora receiver to report 3 bytes for r. I figure if I really had a handle on the different classes and their various representations I should be able to get python to report  or \uf7fa as 3 bytes, but I can't, only the lora receiver sees the length as 3. Any ideas or have I worn out my welcome?
I think you are getting yourself wrapped around an axle. It's easy to focus on something so intently that one does not see the obvious. We've all done it, so, no worries, you are not alone. A long time ago it took me six months to find an error in an Excel spreadsheet I created to calculate the coefficients for a polyphase FIR filter I implemented in an FPGA. I worked on this no less than 12 hours a day trying to understand why the FPGA was not doing what I told it to do. It turns out it was doing exactly what I told it.

Anyhow. '\uF7FA' is a unicode character. Somewhere along the line, in your code, it is likely being encoded into UTF-8. Not sure where or how. Is it the driver/library you might be using? Hard to say.

Well, when you encode 0xF7FA into UTF-8, what do you get?

Code: Select all

>>> '\uf7fa'.encode("UTF-8")
b'\xef\x9f\xba'
That's your three bytes. Lovely, isn't it?

I think the point is that you really don't need to create a situation where you go through a unicode encoding to talk to the outside world. Whether you use struct or baseline Python data types, keep control of what you create every step of the way. That was part of the intent of the silly code I posted showing conversion of 16 bit quantities to different representations to eventually end-up with the same 16 bit number.

Hope this helps.

martincho
Posts: 96
Joined: Mon May 16, 2022 9:59 pm

Re: bits to bytes

Post by martincho » Fri Jul 08, 2022 4:11 pm

KJM wrote:
Fri Jul 08, 2022 8:09 am

Code: Select all

>>> bis='11110'
>>> struct.pack('<H', sum(int(bis)*(1<<(i^7)) for i, b in enumerate(bis)))
b'\xd0\n'
Don't do that. You are just confusing yourself.

Do not mash a bunch of code into one line. It might look cool, but it is hard to understand and debug.

You can do that if you want after fully validating the code. Not before. First, understand. Then you can optimize.

Case in point:

Code: Select all

>>> import struct
>>> bis = '11110'
>>> bis
'11110'
>>> type(bis)
<class 'str'>
>>> the_number = sum(int(bis)*(1<<(i^7)) for i, b in enumerate(bis))
>>> the_number
2755280  # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< LOOK AT THIS NUMBER
This is not a 16 bit quantity. This requires a minimum of three bytes to store and transmit, and, depending on the system, will result in four or eight bytes in memory.

And then you are telling struct to pack it into 16 bits:

Code: Select all

>>> struct.pack("H", the_number)
b'\xd0\n'
As a tuple to reinforce the point:

Code: Select all

>>> tuple(struct.pack("H", the_number))
(208, 10)
If you tried to do this with full Python you'd get:

Code: Select all

import struct
bis = '11110'
the_number = sum(int(bis)*(1<<(i^7)) for i, b in enumerate(bis))
the_number
2755280
struct.pack("H", the_number)
Traceback (most recent call last):
  File "C:\Python\Python39\lib\code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
struct.error: ushort format requires 0 <= number <= 0xffff
Not sure why MicroPython isn't throwing an error. That's irrelevant. You are asking it to pack a value that is greater than 0xFFFF into a two byte slot. That's never going to work, error or not.

And yet, '11110' isn't 2755280. How did you get there?

Once again, packing that complex bunch of code into a single line does nothing useful and is very confusing. I can't read it. And I have been coding on all kinds of platforms and languages for 40 years.

I had to break it down to understand how we got from 11110b, which is 30 in decimal, to that massive number:

Code: Select all

bis = '11110'
result = 0
for i, b in enumerate(bis):
    int_bis = int(bis)
    this_thing = 1 << (i ^ 7)
    intermediate = int_bis * this_thing
    print(f"int_bis:{int_bis}  this_thing:{this_thing:4d}  intermediate:{intermediate:8d}")
    result += intermediate

print(result)
The output (Python 3.9:

Code: Select all

int_bis:11110  this_thing: 128  intermediate: 1422080
int_bis:11110  this_thing:  64  intermediate:  711040
int_bis:11110  this_thing:  32  intermediate:  355520
int_bis:11110  this_thing:  16  intermediate:  177760
int_bis:11110  this_thing:   8  intermediate:   88880
2755280
Look at the effort it took to actually see what that one line of code is doing.

Please stop doing that. Go back to basics. Break things down. Look at the results every stage of the way. Validate with the range of values you will be working with (if it's 16 bits, run all 2**16 values through your code). Take an incremental approach to getting from your bits to the bytes that go out the serial port. If you do this you will succeed.

Another point. While struct is neat, it ends-up packing things into a bytes() object. You might want to avoid messing with struct until you can construct that bytes object yourself and have the data in and out do what you need it to do. Once you achieve that you can choose to use struct to provide you with a different way to handle things. I think you need to operate at the most basic level before getting confused by side effects that can be introduced by tools like struct (padding and byte alignment come to mind).

Read the docs for both Python and MicroPython to know what you are working with.

KJM
Posts: 158
Joined: Sun Nov 18, 2018 10:53 pm
Location: Sydney AU

Re: bits to bytes

Post by KJM » Sat Jul 09, 2022 2:06 am

Thnx for your patient explanations gents, your persistence is appreciated.

Jimmo. I'll swap back to a list of 1 & 0 integers for the contact closures. struct represents [1,1,1,1,0] as 0xf0 which is intuitive for me because it works in comfortable 4 bit nibbles. My problem with hex(int('11110') is that the 0x1e representation is less easily verified in my head. I just find it easier to do 8+4+2+1=15=f than I do 16+8+4+2=16+14=1e

Martincho. I think you nailed it, the lora driver must massage the  symbol into the utf byte b'\xef\x9f\xba' before it sends it, a more likely explanation than my \u f7 fa grasping at straws idea

Lastly is

Code: Select all

bytesin=b'\xf0\x00'
tup=struct.unpack('<H', bytesin)
bytesout=bin(tup[0])
lst=[int(i) for i in bytesout[2:]]
the best way to recover my original list?

Post Reply