Compression for uzlib

Discussion about programs, libraries and tools that work with MicroPython. Mostly these are provided by a third party.
Target audience: All users and developers of MicroPython.
Post Reply
Maksym Galemin
Posts: 11
Joined: Mon May 28, 2018 11:48 pm

Compression for uzlib

Post by Maksym Galemin » Thu Jan 31, 2019 9:29 am

I'm wondering why uzlib module doesn't support compression? Surely, in the current implementation of tgzip example (uzlib ver. 2.9.2) the whole input file should be copied into RAM, but looking at the source code I can't find anything that makes splitting the input file into a number of data chunks and compressing them separately into gzip "members" (or "compressed data sets", see RFC 1951 and RFC 1952 for details) impossible. The final compression ration won't be as good as in case of a single huge gzip "member", but even with 8192-bytes data chunks per gzip "member" the final results looks not that bad. In any case I think it's better to have not very efficient compression in uzlib module rather then not having it at all.

Example main() code from tgzip.c (just a quick hack):

Code: Select all

int main(int argc, char *argv[])
{
    FILE *fin, *fout;
    unsigned int len;

    printf("tgzip - example from the uzlib library\n\n");

    if (argc < 3)
    {
       printf(
          "Syntax: tgunzip <source> <destination>\n\n"
          "Both input and output are kept in memory, so do not use this on huge files.\n");

       return 1;
    }

    /* -- open files -- */

    if ((fin = fopen(argv[1], "rb")) == NULL) exit_error("source file");

    if ((fout = fopen(argv[2], "wb")) == NULL) exit_error("destination file");

    /* -- read source -- */

    fseek(fin, 0, SEEK_END);

    len = ftell(fin);

    fseek(fin, 0, SEEK_SET);

    unsigned crc = ~0;
    size_t size_remaining = len;
    unsigned char source[8192] = {};
    unsigned int hash_bits = 12;
    size_t hash_table_size = sizeof(uzlib_hash_entry_t) * (1 << hash_bits);
    uzlib_hash_entry_t hash_table[1 << hash_bits];

    while (size_remaining > 0)
    {
        size_t bytes_to_read = (size_remaining > sizeof(source)) ? sizeof(source) : size_remaining;
        if (fread(source, 1, bytes_to_read, fin) != bytes_to_read) exit_error("read");

        /* -- compress data -- */

        struct uzlib_comp comp = {0};
        comp.dict_size = 32768;
        comp.hash_bits = hash_bits;
        comp.hash_table = hash_table;
        memset(comp.hash_table, 0, hash_table_size);

        zlib_start_block(&comp.out);
        uzlib_compress(&comp, source, bytes_to_read);
        zlib_finish_block(&comp.out);

        /* -- write output -- */

        putc(0x1f, fout);
        putc(0x8b, fout);
        putc(0x08, fout);
        putc(0x00, fout); // FLG
        int mtime = 0;
        fwrite(&mtime, sizeof(mtime), 1, fout);
        putc(0x04, fout); // XFL
        putc(0x03, fout); // OS
        
        fwrite(comp.out.outbuf, 1, comp.out.outlen, fout);

        crc = ~uzlib_crc32(source, bytes_to_read, ~0);

        fwrite(&crc, sizeof(crc), 1, fout);
        fwrite(&bytes_to_read, sizeof(bytes_to_read), 1, fout);

        size_remaining -= bytes_to_read;
    }

    fclose(fin);
    fclose(fout);

    return 0;
}

pfalcon
Posts: 1155
Joined: Fri Feb 28, 2014 2:05 pm

Re: Compression for uzlib

Post by pfalcon » Fri Feb 01, 2019 8:51 am

I'm wondering why uzlib module doesn't support compression?
Obvious reason would be that nobody did that?

And doing that properly would require elaborating the API of the underlying C library (also called uzlib).
Awesome MicroPython list
Pycopy - A better MicroPython https://github.com/pfalcon/micropython
MicroPython standard library for all ports and forks - https://github.com/pfalcon/micropython-lib
More up to date docs - http://pycopy.readthedocs.io/

tylerkolden
Posts: 2
Joined: Sat Jun 08, 2019 4:39 pm

Re: Compression for uzlib

Post by tylerkolden » Sat Jun 08, 2019 4:46 pm

Hi Maksym,

Did you end up getting uzlib to support compression? I am trying to do some streaming compression in Micropython but have not had much success in finding relevant examples.

Maksym Galemin
Posts: 11
Joined: Mon May 28, 2018 11:48 pm

Re: Compression for uzlib

Post by Maksym Galemin » Mon Jun 10, 2019 9:01 am

Hi tylerkolden,

Yes, ended up implementing .tar.gz compatible compression for uzlib + microtar library. I included a simple example into my original post, basically you need to split your input file into a number of data chunks, compress each of them into a separate gzip member and write all the members one-by-one into the output archive file.

Post Reply