Westwood LCW
Format type | Compression algorithm |
---|---|
Type | Stream |
I/O unit size | 1-byte |
Games |
The Westwood LCW compression algorithm was Westwood Studios' response to Unisys' enforcement of their patent on the LZW algorithm. Before its real name was known to the fanbase, the compression format was generally referred to as "Format 80", because it is indicated with byte value 0x80 in the frame headers of Command & Conquer 1 SHP files, and because all compressed content ends on byte 0x80.
The internally used name, LCW[1], stands for Lempel–Castle–Welch, a variation on Lempel–Ziv–Welch, with "Castle" referring to Louis Castle, who wrote the algorithm. It is unknown why this name was chosen, though, since neither the algorithm nor its output resemble LZW in any way. In fact, its final compressed data is much more similar to Code-based RLE compression, but with a much-extended set of commands. It is entirely possible the only reason for the name is that the algorithm was their replacement for LZW.
Explanation
There are several different commands, with different sizes, from 1 to 5 bytes. The positions mentioned below always refer to the destination buffer (i.e. the uncompressed image). The relative positions are relative to the current position in the destination buffer, which is one byte beyond the last written byte.
If the data starts with byte 0, then Command 3 and Command 5 are relative, meaning the final position is the current destination position minus the given position value. This system was used to compress larger data, like the VQA videos, in later games.
Note: in the following explanations, a dot (.
) refers to a single bit, and a double underscore (__
) means a whole byte.
Command 1: Short copy
This command is 1 byte long.
+---+---+---+---+---+---+---+---+ | 1 | 0 | . | . | . | . | . | . | +---+---+---+---+---+---+---+---+ \_____________________/ | Count
This one means : copy next Count
bytes as is from Source
to Dest
.
A Count
value of 0, so a byte 80h, indicates the end of the compressed data is reached.
Command 2: Existing block relative copy
This command is 2 bytes long.
+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ | 0 | . | . | . | . | . | . | . | | . | . | . | . | . | . | . | . | +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ \_________/ \_________________________________________________/ | | Count-3 Pos
This means, copy Count
bytes from Dest
at Current Position - Pos
to Current Position
. Note that you have to add 3 to the number you find in the bits 4-6 of the first byte to obtain the Count
. If the Pos
is 1, that means repeat Count
times the previous byte.
Command 3: Existing block medium-length copy
This command is 3 bytes long. Whether it is absolute or relative depends on the first byte in the data being 0 to mark the data as using relative offsets.
+---+---+---+---+---+---+---+---+ +----+----+ | 1 | 1 | . | . | . | . | . | . | | __ | __ | +---+---+---+---+---+---+---+---+ +----+----+ \_____________________/ Pos | Count-3
In absolute mode, this should copy Count
bytes from Pos
in the destination buffer to the Current Position
in Dest
, where Pos
is absolute from the start of the destination buffer. Since Pos
is a word, this means that the compressed data can't be larger than 64K.
In relative mode, this should copy Count
bytes from Dest
at Current Position - Pos
to Current Position
in Dest
.
Because Count
values 0b111110 and 0b111111 have special meanings (see below), the maximum value that can be stored in this is 0b111101. With the +3 applied to it to get the full value, this results in a maximum Count
value of 64.
Command 4: Repeat value
This command is 4 bytes long.
+---+---+---+---+---+---+---+---+ +----+----+ +----+ | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | | __ | __ | | __ | +---+---+---+---+---+---+---+---+ +----+----+ +----+ Count Value
Write Value
Count
times. (Count
is a word, Value
is a byte)
Command 5: Existing block long copy
This command is 5 bytes long. Whether it is absolute or relative depends on the first byte in the data being 0 to mark the data as using relative offsets.
+---+---+---+---+---+---+---+---+ +----+----+ +----+----+ | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | | __ | __ | | __ | __ | +---+---+---+---+---+---+---+---+ +----+----+ +----+----+ Count Pos
In absolute mode, copy Count
bytes from Dest
starting at Pos
. Pos
is absolute from the start of the Dest
buffer. Both Count
and Pos
are words.
In relative mode, this should copy Count
bytes from Dest
at Current Position - Pos
to Current Position
in Dest
.
Code
Decompression
ASM
The original X86 Assembler code for decompressing LCW can be found in DrawMisc.cpp, in the official source code release of Command & Conquer.
C++
Westwood's official C++ conversion of their original ASM code for decompressing LCW can be found in LCW.CPP, in the official source code release of Command & Conquer: Red Alert.
Pascal
This is the decompression code written by Vladan Bato when he figured out the format in his research into the first Command & Conquer game.
It was adapted to incorporate support for relative commands based on information provided by Gordan Ugarkovic in his research of the VQA video format.
- DP = destination pointer
- SP = source pointer
- rel = value that is 0 to indicate that commands 3 and 5 are relative
- Source and Dest are the two buffers
SP:=0;
DP:=0;
rel:=Source[SP];
if rel=0 then inc(SP); { if relative, skip indicator byte }
repeat
Com:=Source[SP];
inc(SP);
b7:=Com shr 7; {b7 is bit 7 of Com}
case b7 of
0 : begin {copy command (2)}
{Count is bits 4-6 + 3}
Count:=(Com and $7F) shr 4 + 3;
{Position is bits 0-3, with bits 0-7 of next byte}
Posit:=(Com and $0F) shl 8+Source[SP];
Inc(SP);
{Starting pos=Cur pos. - calculated value}
Posit:=DP-Posit;
for i:=Posit to Posit+Count-1 do
begin
Dest[DP]:=Dest[i];
Inc(DP);
end;
end;
1 : begin
{Check bit 6 of Com}
b6:=(Com and $40) shr 6;
case b6 of
0 : begin {Copy as is command (1)}
Count:=Com and $3F; {mask 2 topmost bits}
if Count=0 then break; {EOF marker}
for i:=1 to Count do
begin
Dest[DP]:=Source[SP];
Inc(DP);
Inc(SP);
end;
end;
1 : begin {large copy, very large copy and fill commands}
{Count = (bits 0-5 of Com) +3}
{if Com=FEh then fill, if Com=FFh then very large copy}
Count:=Com and $3F;
if Count<$3E then {large copy (3)}
begin
Inc(Count,3);
{Next word = pos. from start of image}
if rel=0 then { relative }
Posit:=DP-Word(Source[SP])
else
Posit:=Word(Source[SP]);
Inc(SP,2);
for i:=Posit to Posit+Count-1 do
begin
Dest[DP]:=Dest[i];
Inc(DP);
end;
end
else if Count=$3F then {very large copy (5)}
begin
{next 2 words are Count and Pos}
Count:=Word(Source[SP]);
if rel=0 then { relative }
Posit:=DP-Word(Source[SP+2])
else
Posit:=Word(Source[SP+2]);
Inc(SP,4);
for i:=Posit to Posit+Count-1 do
begin
Dest[DP]:=Dest[i];
Inc(DP);
end;
end
else
begin {Count=$3E, fill (4)}
{Next word is count, the byte after is color}
Count:=Word(Source[SP]);
Inc(SP,2);
b:=Source[SP];
Inc(SP);
for i:=0 to Count-1 do
begin
Dest[DP]:=b;
inc(DP);
end;
end;
end;
end;
end;
end;
until false;
Note that you won't be able to compile this code, because the typecasting won't work. (But I'm sure you'll be able to fix it).
Compression
Several tools created by the Command & Conquer modding community, like XCC Mixer, OS SHP Builder, and the older DOS tools, like RAMIX and Vladan Bato's own Mix Manager, can write LCW, but these tools often use either very simple compression, or use the 'copy' commands and perform no real compression at all.
A reconstruction of the original LCW and XOR Delta compression algorithms was written by OmniBlade of the Assembly Armada team, aided by descriptions of the used principles by Louis Castle himself, and was released as C++ code on the CnCNet forums. This code achieves exactly the same compression as the original files inside the Westwood games.
A C# version of the code is available in the source code of Nyerguds's Engie File Converter, which handles a multitude of Westwood Studios formats.
OmniBlade's C++ code: (released under the GNU General Public License)
////////////////////////////////////////////////////////////////////////////////
// Some utility functions to get worst case sizes for buffer allocation
////////////////////////////////////////////////////////////////////////////////
inline int LCW_Worst_Case(int datasize){
return datasize + (datasize / 63) + 1;
}
////////////////////////////////////////////////////////////////////////////////
// Some defines used by the encoders
////////////////////////////////////////////////////////////////////////////////
#if !defined(UINT16_MAX)
#define UINT16_MAX 0xFFFFu
#endif
////////////////////////////////////////////////////////////////////////////////
// NAME:
// LCW_Comp()
//
// DESCRIPTION:
// Compresses data to the proprietary LCW format used in
// many games developed by Westwood Studios. Compression is better
// than that achieved by popular community tools. This is a new
// implementation based on understanding of the compression gained from
// the reference code.
//
// PARAMETERS:
// input -- pointer to the data to compress.
// output -- pointer to a buffer to store the compressed data.
// size -- size of the data to compress.
//
// RETURN:
// Length of compressed data.
//
// WARNINGS:
// Compressed can be larger than the source buffers in worse case, allocate
// storage accordingly.
//
////////////////////////////////////////////////////////////////////////////////
int LCW_Comp(void const *input, void *output, unsigned long size)
{
//Decide if we are going to do relative offsets for 3 and 5 byte commands
bool relative = size > UINT16_MAX ? true : false;
if ( !size || !input || !output) {
return 0;
}
const uint8 *getp = static_cast<const uint8 *>(input);
uint8 *putp = static_cast<uint8 *>(output);
const uint8 *getstart = getp;
const uint8 *getend = getp + size;
uint8 *putstart = putp;
//relative LCW starts with 0 as flag to decoder.
//this is only used by later games for decoding hi-color vqa files
if ( relative ) {
*putp++ = 0;
}
//Implementations that properly conform to the WestWood encoder should
//write a starting cmd1. Its important for using the offset copy commands
//to do more efficient RLE in some cases than the cmd4.
//we also set bool to flag that we have an on going cmd1.
uint8 *cmd_onep = putp;
*putp++ = 0x81;
*putp++ = *getp++;
bool cmd_one = true;
//Compress data until we reach end of input buffer.
while ( getp < getend ) {
//Is RLE encode (4bytes) worth evaluating?
if ( getend - getp > 64 && *getp == *(getp + 64) ) {
//RLE run length is encoded as a short so max is UINT16_MAX
const uint8 *rlemax = (getend - getp) < UINT16_MAX ? getend : getp + UINT16_MAX;
const uint8 *rlep;
for ( rlep = getp + 1; *rlep == *getp && rlep < rlemax; ++rlep);
uint16 run_length = rlep - getp;
//If run length is long enough, write the command and start loop again
if ( run_length >= 0x41 ) {
//write 4byte command 0b11111110
cmd_one = false;
*putp++ = 0xFE;
*putp++ = run_length;
*putp++ = run_length >> 8;
*putp++ = *getp;
getp = rlep;
continue;
}
}
//current block size for an offset copy
int block_size = 0;
const uint8 *offstart;
//Set where we start looking for matching runs.
if ( relative ) {
offstart = (getp - getstart) < UINT16_MAX ? getstart : getp - UINT16_MAX;
} else {
offstart = getstart;
}
//Look for matching runs
const uint8 *offchk = offstart;
const uint8 *offsetp = getp;
while ( offchk < getp ) {
//Move offchk to next matching position
while ( offchk < getp && *offchk != *getp ) {
++offchk;
}
//If the checking pointer has reached current pos, break
if ( offchk >= getp ) {
break;
}
//find out how long the run of matches goes for
int i;
for ( i = 1; &getp[i] < getend; ++i ) {
if ( offchk[i] != getp[i] ) {
break;
}
}
if ( i >= block_size) {
block_size = i;
offsetp = offchk;
}
++offchk;
}
//decide what encoding to use for current run
//If its less than 2 bytes long, we store as is with cmd1
if ( block_size <= 2 ) {
//short copy 0b10??????
//check we have an existing 1 byte command and if its value is still
//small enough to handle additional bytes
//start a new command if current one doesn't have space or we don't
//have one to continue
if ( cmd_one && *cmd_onep < 0xBF) {
//increment command value
++*cmd_onep;
*putp++ = *getp++;
} else {
cmd_onep = putp;
*putp++ = 0x81;
*putp++ = *getp++;
cmd_one = true;
}
//Otherwise we need to decide what relative copy command is most efficient
} else {
uint16 offset;
uint16 rel_offset = getp - offsetp;
if ( block_size > 0xA || ((rel_offset) > 0xFFF) ) {
//write 5 byte command 0b11111111
if ( block_size > 0x40 ) {
*putp++ = 0xFF;
*putp++ = block_size;
*putp++ = block_size >> 8;
//write 3 byte command 0b11??????
} else {
*putp++ = (block_size - 3) | 0xC0;
}
offset = relative ? rel_offset : offsetp - getstart;
//write 2 byte command? 0b0???????
} else {
offset = rel_offset << 8 | (16 * (block_size - 3) + (rel_offset >> 8));
}
*putp++ = offset;
*putp++ = offset >> 8;
getp += block_size;
cmd_one = false;
}
}
//write final 0x80, basically an empty cmd1 to signal the end of the stream.
*putp++ = 0x80;
//return the size of the compressed data.
return putp - putstart;
}
References
- ↑ IFF.H in the Command & Conquer source code: LCW // Westwood proprietary compression.
- All file formats
- Compression algorithms
- BattleTech: The Crescent Hawk's Inception
- BattleTech: The Crescent Hawk's Revenge
- Eye of the Beholder
- Eye of the Beholder II: The Legend of Darkmoon
- Dune II
- The Legend of Kyrandia
- Lands of Lore: The Throne of Chaos
- The Legend of Kyrandia: The Hand of Fate
- The Legend of Kyrandia: Malcolm's Revenge
- Command & Conquer
- Monopoly
- Command & Conquer: Red Alert
- Toonstruck
- Lands of Lore 2: Guardians of Destiny
- Blade Runner
- Command & Conquer: Sole Survivor
- Command & Conquer: Tiberian Sun
- Lands of Lore III
- Nox
- Command & Conquer: Red Alert 2
- Westwood Studios File Formats