All Dogs RLE Compression
The compressed data is processed in a loop. After each byte is read, its value is used to select an action. If the byte has a value of 27 (0x1B) then the next two bytes form a big-endian word value that gives the number of identical bytes to output, and the byte after that is the byte to repeat. All other bytes are passed through unchanged. The next byte is then read and the same procedure followed until there is no more data or the decompressed size has been reached.
12 34 1B 00 07 56 78 90
This data is interpreted as follows:
12 34 # Copy 0x12 and 0x34 through unchanged 1B 00 07 56 # Repeat 0x56 seven times 78 90 # Copy 0x78 and 0x90 through unchanged
The decompression in simple steps:
- While read position < compressed size and write position < decompressed size, repeat:
- Read a byte value (increasing read position to the next byte).
- If the byte value = 27, then:
- Read the next two bytes (increasing read position two bytes).
- Store this as big-endian value length.
- Read the next byte as repeat value (increasing read position to the next byte).
- Write the repeat value byte to the output length times (increasing the write position with length).
- Write the value byte to the output (increasing the write position by one).
Analysis and notes
It can be seen that this system is very far from optimal, since two, three and even four repeating bytes will not result in actually reduced file size. To avoid increasing the size, the compression doesn't store repeats smaller than four bytes as flags. Note that 4-byte repeats are stored as repeat flags; perhaps processing a flag is faster than reading and evaluating four separate bytes.
This poor compression scheme means that for CGA graphics, you need a minimum of 20 pixels that all have the same bytes (so 5 identical repeating blocks of 4 pixels) before any real compression can occur. Recompressing all existing archives with an identical compression, but adjusted to use only one byte for the repeat, already gives a file size decrease of 10%. The large open spaces on some of the images, like the title screen and dance animation in the intro, do mean both bytes of the repeat length are used quite often, though one could wonder if some strategic cropping of the graphics wouldn't have had the same effect on most such scenes.
Of course, all this may be a balancing of processing power versus file size; using images that are the full screen width avoids extra calculations for the positioning, and 1-byte repeat lengths would require multiple repeat flags to fill up space that can be filled in a single operation now, which is most likely a single CPU instruction. Perhaps the potential file size decrease wasn't worth having more processor-intensive decompression.