Page 1 of 1

Compression for uzlib

Posted: Thu Jan 31, 2019 9:29 am
by Maksym Galemin
I'm wondering why uzlib module doesn't support compression? Surely, in the current implementation of tgzip example (uzlib ver. 2.9.2) the whole input file should be copied into RAM, but looking at the source code I can't find anything that makes splitting the input file into a number of data chunks and compressing them separately into gzip "members" (or "compressed data sets", see RFC 1951 and RFC 1952 for details) impossible. The final compression ration won't be as good as in case of a single huge gzip "member", but even with 8192-bytes data chunks per gzip "member" the final results looks not that bad. In any case I think it's better to have not very efficient compression in uzlib module rather then not having it at all.

Example main() code from tgzip.c (just a quick hack):

Code: Select all

int main(int argc, char *argv[])
{
    FILE *fin, *fout;
    unsigned int len;

    printf("tgzip - example from the uzlib library\n\n");

    if (argc < 3)
    {
       printf(
          "Syntax: tgunzip <source> <destination>\n\n"
          "Both input and output are kept in memory, so do not use this on huge files.\n");

       return 1;
    }

    /* -- open files -- */

    if ((fin = fopen(argv[1], "rb")) == NULL) exit_error("source file");

    if ((fout = fopen(argv[2], "wb")) == NULL) exit_error("destination file");

    /* -- read source -- */

    fseek(fin, 0, SEEK_END);

    len = ftell(fin);

    fseek(fin, 0, SEEK_SET);

    unsigned crc = ~0;
    size_t size_remaining = len;
    unsigned char source[8192] = {};
    unsigned int hash_bits = 12;
    size_t hash_table_size = sizeof(uzlib_hash_entry_t) * (1 << hash_bits);
    uzlib_hash_entry_t hash_table[1 << hash_bits];

    while (size_remaining > 0)
    {
        size_t bytes_to_read = (size_remaining > sizeof(source)) ? sizeof(source) : size_remaining;
        if (fread(source, 1, bytes_to_read, fin) != bytes_to_read) exit_error("read");

        /* -- compress data -- */

        struct uzlib_comp comp = {0};
        comp.dict_size = 32768;
        comp.hash_bits = hash_bits;
        comp.hash_table = hash_table;
        memset(comp.hash_table, 0, hash_table_size);

        zlib_start_block(&comp.out);
        uzlib_compress(&comp, source, bytes_to_read);
        zlib_finish_block(&comp.out);

        /* -- write output -- */

        putc(0x1f, fout);
        putc(0x8b, fout);
        putc(0x08, fout);
        putc(0x00, fout); // FLG
        int mtime = 0;
        fwrite(&mtime, sizeof(mtime), 1, fout);
        putc(0x04, fout); // XFL
        putc(0x03, fout); // OS
        
        fwrite(comp.out.outbuf, 1, comp.out.outlen, fout);

        crc = ~uzlib_crc32(source, bytes_to_read, ~0);

        fwrite(&crc, sizeof(crc), 1, fout);
        fwrite(&bytes_to_read, sizeof(bytes_to_read), 1, fout);

        size_remaining -= bytes_to_read;
    }

    fclose(fin);
    fclose(fout);

    return 0;
}

Re: Compression for uzlib

Posted: Fri Feb 01, 2019 8:51 am
by pfalcon
I'm wondering why uzlib module doesn't support compression?
Obvious reason would be that nobody did that?

And doing that properly would require elaborating the API of the underlying C library (also called uzlib).