Zlib Compression
Format type | Compression algorithm |
---|---|
Type | Stream |
I/O unit size | 1-bit |
Games |
Zlib Compression is used by a number of programs to compress data, notably Jazz Jackrabbit 2. Unlike many compression systems, Zlib has no single way to compress data. (Though decompression is easier, since the compressed data will indicate its compression method.)
Zlib is an implementation of the DEFLATE compression system and there are many informative documents about both formats as they are somewhat a de facto standard form of compression. Basically Zlib is a combination of Huffman and LZW Compression in which data can be compressed in three ways:
- Not at all (data that can't be compressed further)
- LZW then Huffman (using a Huffman binary tree defined by the Zlib format)
- Huffman then LZW (with Huffman binary trees stored in the data)
It is notable that Zlib places restrictions on the Huffman trees it uses, notably:
- Shorter codes are left of longer codes
- With codes of equal length, those that come first in the ASCII character set are placed leftmost.
This means that there is usually only one possible Huffman binary tree for any given set of data, eliminating the need to include the tree (dictionary) with the data. It is also notable that when LZW data is compressed by Huffman the 'alphabet' contains both all the literals being used, length flags and the 'end of block' flag (but not distance flags.) However, despite this, the Huffman tree is still restricted to 254 nodes (code lengths between 0-15 bits.)
Custom Huffman trees are even more complex. Since there is only one tree for any given set of data, the tree is sent as code lengths (values between 0-15) which are then RLE compressed, giving a series of values between 0-18 bits THIS data is then Huffman compressed using a standard (and quite small) tree which is included with the data.
Decompression
As expected, decompression is complex. Fortunately there are a number of utilities and addons that can be used in lieu of a programmer writing their own code.