Compression for uzlib

Maksym Galemin · Post by **Maksym Galemin** » Thu Jan 31, 2019 9:29 am

I'm wondering why uzlib module doesn't support compression? Surely, in the current implementation of tgzip example (uzlib ver. 2.9.2) the whole input file should be copied into RAM, but looking at the source code I can't find anything that makes splitting the input file into a number of data chunks and compressing them separately into gzip "members" (or "compressed data sets", see RFC 1951 and RFC 1952 for details) impossible. The final compression ration won't be as good as in case of a single huge gzip "member", but even with 8192-bytes data chunks per gzip "member" the final results looks not that bad. In any case I think it's better to have not very efficient compression in uzlib module rather then not having it at all.

Example main() code from tgzip.c (just a quick hack):

Code: Select all

int main(int argc, char *argv[])
{
    FILE *fin, *fout;
    unsigned int len;

    printf("tgzip - example from the uzlib library\n\n");

    if (argc < 3)
    {
       printf(
          "Syntax: tgunzip <source> <destination>\n\n"
          "Both input and output are kept in memory, so do not use this on huge files.\n");

       return 1;
    }

    /* -- open files -- */

    if ((fin = fopen(argv[1], "rb")) == NULL) exit_error("source file");

    if ((fout = fopen(argv[2], "wb")) == NULL) exit_error("destination file");

    /* -- read source -- */

    fseek(fin, 0, SEEK_END);

    len = ftell(fin);

    fseek(fin, 0, SEEK_SET);

    unsigned crc = ~0;
    size_t size_remaining = len;
    unsigned char source[8192] = {};
    unsigned int hash_bits = 12;
    size_t hash_table_size = sizeof(uzlib_hash_entry_t) * (1 << hash_bits);
    uzlib_hash_entry_t hash_table[1 << hash_bits];

    while (size_remaining > 0)
    {
        size_t bytes_to_read = (size_remaining > sizeof(source)) ? sizeof(source) : size_remaining;
        if (fread(source, 1, bytes_to_read, fin) != bytes_to_read) exit_error("read");

        /* -- compress data -- */

        struct uzlib_comp comp = {0};
        comp.dict_size = 32768;
        comp.hash_bits = hash_bits;
        comp.hash_table = hash_table;
        memset(comp.hash_table, 0, hash_table_size);

        zlib_start_block(&comp.out);
        uzlib_compress(&comp, source, bytes_to_read);
        zlib_finish_block(&comp.out);

        /* -- write output -- */

        putc(0x1f, fout);
        putc(0x8b, fout);
        putc(0x08, fout);
        putc(0x00, fout); // FLG
        int mtime = 0;
        fwrite(&mtime, sizeof(mtime), 1, fout);
        putc(0x04, fout); // XFL
        putc(0x03, fout); // OS
        
        fwrite(comp.out.outbuf, 1, comp.out.outlen, fout);

        crc = ~uzlib_crc32(source, bytes_to_read, ~0);

        fwrite(&crc, sizeof(crc), 1, fout);
        fwrite(&bytes_to_read, sizeof(bytes_to_read), 1, fout);

        size_remaining -= bytes_to_read;
    }

    fclose(fin);
    fclose(fout);

    return 0;
}

pfalcon · Post by **pfalcon** » Fri Feb 01, 2019 8:51 am

I'm wondering why uzlib module doesn't support compression?

Obvious reason would be that nobody did that?

And doing that properly would require elaborating the API of the underlying C library (also called uzlib).

tylerkolden · Post by **tylerkolden** » Sat Jun 08, 2019 4:46 pm

Hi Maksym,

Did you end up getting uzlib to support compression? I am trying to do some streaming compression in Micropython but have not had much success in finding relevant examples.

Maksym Galemin · Post by **Maksym Galemin** » Mon Jun 10, 2019 9:01 am

Hi tylerkolden,

Yes, ended up implementing .tar.gz compatible compression for uzlib + microtar library. I included a simple example into my original post, basically you need to split your input file into a number of data chunks, compress each of them into a separate gzip member and write all the members one-by-one into the output archive file.

MicroPython Forum (Archive)

Compression for uzlib

Compression for uzlib

Re: Compression for uzlib

Re: Compression for uzlib

Re: Compression for uzlib