bits to bytes

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
martincho
Posts: 96
Joined: Mon May 16, 2022 9:59 pm

Re: bits to bytes

Post by martincho » Tue Jun 28, 2022 4:34 pm

KJM wrote:
Sun Jun 26, 2022 11:22 pm
So no easy way to get b'40 80' ? Seems to me python is being as slippery with octal/hex here as it sometimes is with types?
When debugging I always do something like this:

Code: Select all

[hex(n) for n in bytes_or_bytearray_variable]
<Rant mode on>
I can't stand some aspects of the Python bytes and bytearray implementation. It feels unnecessarily complex, clunky and lacking important features for anyone who needs to manage data at this level.

With regards to display, I have absolutely no use for the way Python chooses to display either of them. If you are working at the hardware level moving bits and bytes around you do not want to see something like b'\x02\x05(*^%trQ\x09'.

There are far better ways to implement these data types. Sadly this is now baked in. My guess is that the (full Python) developers don't have a lot of exposure to low level code, hence the joining at the hip of bytes and bytearray types to character encoding, which, in my not-so-humble opinion is a bad idea.
<Rant mode off>

KJM
Posts: 158
Joined: Sun Nov 18, 2018 10:53 pm
Location: Sydney AU

Re: bits to bytes

Post by KJM » Tue Jun 28, 2022 11:23 pm

I sympathise. Some days it just feels like there are too many layers of abstraction between me & the ESP32. In my salad days I'd stick a CRO probe on the modulation pin of a radio transmitter to check the data, now days I'm reduced to checking the duration of a raw lora burst in effort to verify how many bytes were actually sent.

martincho
Posts: 96
Joined: Mon May 16, 2022 9:59 pm

Re: bits to bytes

Post by martincho » Wed Jun 29, 2022 2:53 am

I can appreciate that. I go back to real VT-100 terminals. I get it.

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: bits to bytes

Post by pythoncoder » Wed Jun 29, 2022 7:40 am

Oh, come on lads ;) I started when the ASR33 was the latest thing and "glass teletypes" were science fiction. An old dog can learn new tricks, it just takes a little longer.

I don't like the default way that Python formats bytes objects and bytearrays, but it doesn't change the fact that it is a fine language for bit manipulations. If you don't like what ubinascii.hexlify does, there are plenty of ways to write your own with formatting to suit your taste.

Code: Select all

foo = lambda a : f"{int.from_bytes(a, 'big'):0{2*len(a)}x}"
foo(bytearray((01,02,03,0xfe)))
'010203fe'
Peter Hinch
Index to my micropython libraries.

martincho
Posts: 96
Joined: Mon May 16, 2022 9:59 pm

Re: bits to bytes

Post by martincho » Wed Jun 29, 2022 6:57 pm

Well, ASR33. OK, you win. I did use teletypes at university for a very short time (maybe one term). I remember a really interesting one that used spark discharge to print on the paper (how many people know that, at some point in time, terminals actually used paper and did not have a display?). It used a special waxed paper. The sparks would burn the wax to print. Crazy stuff.

Regarding bytes/bytearrays. When I need to get into the details I tend to use this simple hex dump tool I wrote. It makes it far easier to look at data, from, say, a communications protocol or a chip you might be talking to.

Code: Select all

# Hex dump bytes and bytearrays

_BYTES_PER_LINE = 16
_ADDRESS_WIDTH = 8
_SPACE_TO_HEX = 1
_SPACE_TO_ASCII = 1
_SPACE_BETWEEN_HEX = 1
_DISTANCE_TO_ASCII = _BYTES_PER_LINE * (2 + _SPACE_BETWEEN_HEX) + _SPACE_TO_ASCII


def hexdump(data:bytes, show_ascii:bool= True):
    length = len(data)
    if not length:
        print("Empty")
        return
    # Print header
    header = f"{' ' * (_ADDRESS_WIDTH + _SPACE_TO_HEX)}"
    for i in range(_BYTES_PER_LINE):
        header += f"{i:02X}{' ' * _SPACE_BETWEEN_HEX}"
    if show_ascii:
        header += f"{' ' * _SPACE_TO_ASCII}"
        for i in range(_BYTES_PER_LINE):
            header += f"{i:1X}"
    print(header)
    # Print hex dump
    lines = length/_BYTES_PER_LINE
    lines = int(lines) + bool(lines%1)
    address = 0
    for row in range(lines):
        line = f"{address: 0{_ADDRESS_WIDTH}X}{' ' * _SPACE_TO_HEX}"
        hex = ""
        text = ""
        chunk = min(_BYTES_PER_LINE, length)
        for offset in range(chunk):
            index = address + offset
            if index >= length:
                break
            byte = data[index]
            hex += f"{byte:02X}{' ' * _SPACE_BETWEEN_HEX}"
            if show_ascii:
                is_visible = 128 > byte >= 32
                text += chr(data[index]) if is_visible else "."

        line += hex + " " * (_DISTANCE_TO_ASCII - len(hex)) + text
        address += _BYTES_PER_LINE
        print(line)
    print()


if __name__ == "__main__":
    # Test
    ba = bytearray([n%256 for n in range(35)])
    hexdump(ba)

    ba = bytes([n%256 for n in range(210)])
    hexdump(ba)
There's room for improvement there. I chip away at it every so often. One interesting variant might be to make it work with 16 and 32 bit words, not just 8 bits. I have another variant that is protocol-aware and formats the output to make it easier to understand what might be going on.

One of the things I wish they had done with this issue of just working with bytes is to create the option for a mutable, yet fixed length, data structure. In other words, like bytes, yet mutable only for content. A bytes data structure in memory is far simpler than a bytearray and faster to process. This would be very useful for keeping memory allocation and garbage collection under control when you know that you can work with a fixed length buffer.

This isn't going to change, so, it is what it is. Even if I wanted to write my own, the problem is that so many routines return bytes() that you just can't escape the memory allocation overhead. Something like the serial readinto() method wants a bytearray --which brings into the fold the more complex data structure. Not sure if it will work with arrays, haven't tried it. Also not sure if that would be an advantage or not.

I'll be done with a fairly intensive project in a few weeks. After that I have a bunch of notes on things I would really like to look into and perhaps see if I may be able to contribute to the codebase. I've done a lot with data compression and expansion. There are some issues with the MicroPython unzip library that need to be fixed (or documented so people can use it). I have also identified a lot of opportunities to make the MicroPython faster. That might be one of my initial targets for contribution.

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: bits to bytes

Post by jimmo » Wed Jun 29, 2022 11:37 pm

martincho wrote:
Wed Jun 29, 2022 6:57 pm
I have also identified a lot of opportunities to make the MicroPython faster.
Great!

One of the most useful tools for improving MicroPython's performance is the performance test suite. See https://github.com/micropython/micropyt ... perf_bench for details.

One thing I would like to understand better is how well it matches real-world use cases. i.e. if we improve the performance tests by X%, what does that correspond to in terms of real user experience. Or conversely, if we save X kiB of code size at a Y% performance hit, does that really matter or not?

So far most of my attempts at performance have had disappointingly small returns (other than https://github.com/micropython/micropython/pull/7680 and https://github.com/micropython/micropython/pull/7688). Those PRs might be useful to you though in terms of the approach.

You also might find https://github.com/micropython/micropython/pull/5926 useful for hunting down hotspots (and note that the top hit found there is soon to be fixed in https://github.com/micropython/micropython/pull/6896)

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: bits to bytes

Post by jimmo » Wed Jun 29, 2022 11:56 pm

martincho wrote:
Wed Jun 29, 2022 6:57 pm
Regarding bytes/bytearrays. When I need to get into the details I tend to use this simple hex dump tool I wrote. It makes it far easier to look at data, from, say, a communications protocol or a chip you might be talking to.
FYI, https://github.com/micropython/micropyt ... ils/xxd.py

martincho
Posts: 96
Joined: Mon May 16, 2022 9:59 pm

Re: bits to bytes

Post by martincho » Thu Jun 30, 2022 12:34 am

jimmo wrote:
Wed Jun 29, 2022 11:37 pm
martincho wrote:
Wed Jun 29, 2022 6:57 pm
I have also identified a lot of opportunities to make the MicroPython faster.
Great!

One of the most useful tools for improving MicroPython's performance is the performance test suite. See https://github.com/micropython/micropyt ... perf_bench for details.
I'll probably have some time to dig into this in a few (8 to 10) weeks. I've been having to dig through MicroPython source to understand a number of things. That has been a very superficial intro to the codebase. It will be while for me to really know my way around.

I've been working on three areas: High speed, low latency communications, high speed CRC-16 and CRC-32 calculation (both one-shot and cumulative) and file compression/decompression.

On the decompression front, I think there's a issue with regards to the constraints MicroPython is under in the case of microcontrollers.

The standard zlib algorithm uses a 2**15 window size, which, along with the sliding window requirements results in allocation in the order of 64K. On MicroPython running on a small system (say, RP2040) this is pretty much impossible to support. The problem is that you can't use the stock Python zipfile library to make archives because they baked-in the 2**15 window requirement.

I experimented a lot with this and ended-up creating my own variant of zipfile using a 2**10 window for archive creation. I then built a chunked unzip-ing tool to work with the same window. The idea being that you can't read a full 100K file into memory and decompress it into 300K for storage. The chunked decompression tool uses a variable chunk size that can be pretty small (128 bytes). This uses zlib.DecompIO(), which works pretty well so long as the compressed file used a small enough window.

Without being able to create archives of controlled window size it would always eventually fail with allocation errors in the 2**15 range. One could probably garbage collect along the way. However, when you have such tight memory limitations, allocating 32K to 64K at a time, I think, is a bad idea. I'd rather use-up a lot less memory.

Using this approach, I have successfully decompressed 100+K archives that expanded into 400K of data while only allocating a about 3K, which is really nice. The files get written incrementally as you decompress small chunks at a time. Full 32 bit CRC checking per file ensures integrity.

I need to understand this a lot better before letting it out into the wild or daring to suggest it might be an interesting addition to MP.

martincho
Posts: 96
Joined: Mon May 16, 2022 9:59 pm

Re: bits to bytes

Post by martincho » Thu Jun 30, 2022 4:30 am

jimmo wrote:
Wed Jun 29, 2022 11:56 pm
martincho wrote:
Wed Jun 29, 2022 6:57 pm
Regarding bytes/bytearrays. When I need to get into the details I tend to use this simple hex dump tool I wrote. It makes it far easier to look at data, from, say, a communications protocol or a chip you might be talking to.
FYI, https://github.com/micropython/micropyt ... ils/xxd.py
That's nice.

I tend to have three requirements for software:

1- Comments. No excuses. Lots of comments.
2- I have to be able to read it in ten years and know what it does and how it works in just a few minutes (hence #1).
3- Clearly show usage with examples. One or more tests is usually good. Goes with #2

I have gone back into codebases I wrote 15 to 20 twenty years ago and was able to get back on track quickly because of these requirements.

Back in the 80's I spent almost ten years working with a language called APL. It's incredibly powerful, yet you can (and often do) write code that almost nobody can decipher. This is when my rules took shape and I have lived by them since.

If you have never seen APL, here's a short video demo that is well worth watching:

https://www.youtube.com/watch?v=a9xAKttWgP4

I know people have different styles. This is just the way I do it.

User avatar
jimmo
Posts: 2754
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: bits to bytes

Post by jimmo » Thu Jun 30, 2022 4:53 am

martincho wrote:
Thu Jun 30, 2022 4:30 am
I tend to have three requirements for software:
No argument here! :) I didn't write it, and it was just by amazing coincidence I saw it this morning when I was browsing the micropython-example-boards repo to reply to viewtopic.php?f=15&t=12616&p=68421#p68421

About 20 years ago a colleague taught me "you should be able to make your eyes only read the green text and still know exactly what the code does and why". (Substitute green for red or whatever your editor colours comments as). Working with his code was a joy.

Post Reply