Carmack compression

From ModdingWiki
Jump to navigation Jump to search
Carmack compression
Format typeCompression algorithm
I/O unit size1-byte

Carmack compression is used in the GAMEMAPS file in a number of games to further shrink the levels down beyond what RLEW compression can achieve. Its basic idea is somewhat like LZ (Lempel-Ziv) compression in that it contains pointers back to previous data.

As in RLEW compression, the first word in the Carmack compressed data is the number of bytes (not words) in the decompressed data. This is typically the number of bytes in the compressed RLEW data, as Carmack compression is performed after RLEW compression.

The format specifies some values in words - these are 16-bits in length (UINT16LE).

Carmack compression contains two types of references to previous data: near pointers and far pointers.

Near Pointers

Near pointers occupy three bytes in the compressed data. The first is the number of words in the referenced sequence, the second is the signal byte of 0xA7, and the third is the number of words to the start of the reference (counting backwards from the current location). As a concrete example, the three bytes 0x05 0xA7 0x0A mean 'repeat the 5 words starting 10 words ago'.

Notice that near pointers only let one refer to the last 255 words. To refer to sequences further back, one must use far pointers.

Far Pointers

Far pointers occupy four bytes in the compressed data. The first is, again, the number of words in the referenced sequence, the second is 0xA8, and the third and fourth are interpreted as a UINT16LE word - a 0-based pointer to the start of the reference in words, so the address must be multiplied by two to reach the correct byte location. As a concrete example, the four bytes 0x10 0xA8 0x01 0x02 mean 'repeat the 16 words starting 513 words before the current output position'.

Words with a high byte of 0xA7 or 0xA8

Words whose high (second) byte is 0xA7 or 0xA8 would appear to be issue, as they would be confused with near or far pointers. These are handled by representing them as the three bytes: 00 Ax yy, where yy is the lower 8 bits of the word being escaped. This sequence is recognized as an exception, as repeating zero words would make no sense.

For example:

   00 A7 12 EE FF 00 A8 34 CC DD

Decodes to:

   12 A7 EE FF 34 A8 CC DD

Example code