PKLite is a utility that compresses an executable file into a smaller, self-extracting form. It has been used in a few games although LZEXE was more popular.
A PKLite compressed executable can be recognized by the presence of the "PKWARE" copyright string near the start of the .exe file, although sometimes this has been changed to hide the fact that PKLite has been used.
There are a number of utilities that can decompress PKLite .exe files, including PKLite itself with the -x option.
PKLite itself uses a version of the LZSS algorithm tweaked to operate more at the byte level to make decompression quicker, with a Huffman table used for the LZSS offset and length values to further reduce their size.
The file is in standard DOS EXE Format. The block of data that normally contains executable code instead contains a small decompression routine, followed by the compressed data.
The file is laid out in the following order:
- Normal MZ .exe header
- Extra space for the PKWare copyright header
- Extra space for the original .exe header, minus the "MZ" signature
- Decompression code in the normal .exe code block
- Compressed data follows the decompression code, still in the normal .exe code block
- After the compressed code there is a compressed relocation table using a different compression algorithm
- After the compressed relocation table there is some trailing data of unknown purpose
The offsets for the various structures are as follows.
See EXE Format for the normal MZ header. The fields referred to in the following text are from the header described on that page.
Within the MZ header, the documented fields finish at offset 0x1C. PKLite adds some extra fields at this offset:
- At offset 0x1C from the start of the .exe there is a UINT8 for the minor version number, followed by another UINT8 for the major version number. Only the lower 4 bits appear to be used, so bytes 01 02 mean PKLite version 2.1 and version bytes 05 21 mean PKLite version 1.5. The fifth (0x10) and sixth bits (0x20) are set in the major version number with 0x10 meaning "extra" compression (and PKLite will refuse to decompress it) and 0x20 meaning "large" compression (see below). Use majorVersion & 0x0F to extract the actual major version number.
- Following this, at offset 0x1E there is a copyright message that continues until the start of the relocation table (offReloc from the .exe header).
- At offReloc (again as a byte offset relative to the start of the .exe) the relocation table begins, technically numReloc * 4 bytes should be skipped to jump over the relocation table, however there are always zero entries in the relocation table. Thus "after" the relocation table (at the same offset) there is extra data added by PKLite. This data is the original .exe header, minus the "MZ" bytes. To restore the original .exe file, write out "MZ" followed by this extra data. It continues until the end of the PKLite .exe header (pgHeader * 16).
- The relocation table should then be written to the .exe file but we don't have this yet as it is located after the compressed code.
- After this, at offset pgHeader * 16 the decompression routine starts. By seeking a further 0x4E bytes into this code and reading a Confirm this offset is the same for all PKLite versionsUINT8, the offset of the compressed data can be obtained. This value is in paragraphs (16-byte blocks) from the start of the memory address where the .exe is loaded, so some calculations are necessary to convert it to an offset in the .exe file. This is because the value is relative to address 0x100 as this is where all .exe files have their code loaded, but we need to change it to be relative to where the code begins in the .exe file itself, which is at offset pgHeader * 16. The calculation for this is offset_of_compressed_data = (pgHeader + byte_from_offset_0x4e - 0x10) * 16.
- Once the compressed data has been decompressed (see example code below) and the 0xFF special code has been received indicating the end of the data, the compressed relocation table can be read. Simply perform the following in a loop:
- Once the relocation table has been written to the .exe (after the original header), the following data remains after the compressed relocation table, here referred to as the PKLite footer:
|UINT16LE||final_segSS||Final relative value for SS|
|UINT16LE||final_regSP||Final value for SP|
|UINT16LE||final_segCS||Final relative value for CS|
|UINT16LE||final_regIP||Final IP to jump to|
This is for loading into the registers normally initialised by DOS when an EXE file is loaded, such that once the decompression process is finished, it looks like everything was set up as normal. These values are repeated here even though they are identical to the values in the original header because the original header is not loaded into memory by DOS. These values are not required to rewrite the original .exe file.
- After writing out the relocation table, 0x00 bytes must be inserted to pad the data up until the code begins. The code starts at the offset indicated by pgHeader * 16 so 0x00 bytes must be added until this offset is reached.
- After the padding, the decompressed code can be written and the original .exe file has been restored.
The algorithm was originally documented as the OpenTESArena PKLite specification by Dozayon. It works for most files with "large" compression but there are still some edge cases it does not work for, notably those files that do not have the flag set for "large" compression.
Example source code
A number of utilities were released in the 1990s that could decompress PKLite .exe files, however these either patched and ran the x86 machine-code decompressor in the .exe file or did not come with source code, so they are of no help for documenting the decompression process here.
The following examples are the original source for the next generation of decompressors - those that are open source. Note that these only decompress the machine code block so that data from it can be extracted. They cannot write out a working decompressed .exe as that was not their use case:
- C++: OpenTESArena unpacker
- C: depklite, a C conversion by NY00123 of the OpenTESArena unpacker
- C: hackerb9 depklite, a command-line interface to NY00123's depklite
The following example source code is able to decompress the whole file and produce a working decompressed .exe:
The compression algorithm was reverse engineered and documented for OpenTESArena by Dozayon. Malvineous worked out how to find the start of the compressed data and how to write out the original .exe header and relocation table.