id Software RLEW compression

From ModdingWiki
Jump to navigation Jump to search

id Software RLEW compression was used in a number of id's early games. It is a flag based RLE Compression algorithm that works at the word level, where each word is a group of two bytes (16 bits). The algorithm is used on files that themselves use 16-bit words as their basic units, such as game levels, where a single 16-bit value such as "12 34" might be repeated multiple times in a row. A byte-level RLE algorithm would not be able to compress this data at all, as each byte is different to the previous, however the same RLE algorithm operating at the 16-bit word level can achieve good compression.

The use of 16-bit values over 8-bit also yields other benefits, such as the maximum repetition count increasing from around 256 to 65,536, allowing even greater compression of data with large numbers of repeated values. The RLE flag word can also be chosen to be less likely to appear in the data, minimising the number of escapes required. While 8-bit RLE has a 1 in 256 chance of a given byte needing to be escaped, 16-bit RLE reduces that to 1 in 65,536.

RLEW compression is almost exclusively used to compress level data. Most revisions of the early game engines by id Software used arrays of 16 bit values to store the level data.

Algorithm

The algorithm is a very straightforward flag-based RLE. Every 16-bit value is passed through unchanged until an RLE flag value is encountered. Typically this is 0xFEFE or 0xABCD (the latter stored as the two bytes 0xCD 0xAB in little-endian order). When this flag word is read, instead of writing it out like the other values, two further words are read. The first is the number of times to repeat the word, and the second is the value to be repeated.

All 16-bit words are in little-endian order, although in practice only the repeat count need to be treated as little-endian. The flag value simply has to match, and the copied data simply needs to be copied unchanged.

Data is usually read from the start of a file to the end, but many implementations have the size of the decompressed data as a UINT16LE at the beginning of the file. This should be used where possible so that trailing data at the end of the file does not inadvertently get read as input data. In cases where this does not apply, the size should be predetermined by the expected output size. For example, Blake Stone: Aliens of Gold has files where the output is always 64x64 words.

A simple algorithm for decompressing this format is as follows;

  1. If implemented, get the first word in the file (here called finalLength)
  2. While the length of the output data is less than finalLength:
    1. Read a word from the input data
    2. Is this word the RLE flag word? (e.g. 0xFEFE)
      • If yes;
        1. Get the next two words, the first as Count and the second as Value.
        2. Write Value as a word, Count times.
        3. Move forward three words and go to 2.
      • If no;
        1. Write the word to the output.
        2. Move forward a word and go to 2.

Minimal compression

As only the flag word needs to be handled specially, a quick compression function can be implemented by passing through all values as-is, and if the flag word has just been written then it is followed by the additional words 0x0001 and the flag word again. For example if the flag word is 0xFEFE, then it will end up in the output as FEFE 0001 FEFE, meaning "RLE sequence begins: write 0xFEFE once".

Inefficiencies and hidden data

As only the flag word needs to be escaped, no other word should ever need to be repeated only once. This means the sequence FEFE 0001 FEFE could have been shortened to FEFE 0001 by omitting the field containing the word to repeat, with the assumption that the only word ever written just once is the RLEW flag word itself. However, no such optimisation is done.

The Count field in the RLE sequence can be set to 0 to write a word zero times, effectively discarding the word. In theory, this could allow "hidden" data to be embedded in the compressed data stream that would be skipped over by the decompressor and not affect the output decompressed data.

RLEB compression

This is almost exactly the same as RLEW compression. The only difference is that it operates on bytes instead of words. The flag value for this compression is $FE in most cases.

Dangerous Dave 2, Rescue Rover and Shadow Knights use this method to compress some of their full-screen images. Rescue Rover and Shadow Knights also use this method to compress the game's sprite graphics.

Example code

  • Javascript: the cmp-rlew-id algorithm in gamecomp.js (full compression and decompression)