PKLite

From ModdingWiki
Jump to: navigation, search
PKLite
There is no image of this tool in use — upload one!
PlatformMS-DOS
Release date1990
Homepagepkware.com
Downloadarchiveteam.org
GamesN/A

PKLite is a utility that compresses an executable file into a smaller, self-extracting form. It has been used in a few games although LZEXE was more popular.

A PKLite compressed executable can be recognized by the presence of the "PKWARE" copyright string near the start of the .exe file, although sometimes this has been changed to hide the fact that PKLite has been used.

There are a number of utilities that can decompress PKLite .exe files, including PKLite itself with the -x option.

PKLite itself uses a version of the LZSS algorithm tweaked to operate more at the byte level to make decompression quicker, with a Huffman table used for the LZSS offset and length values to further reduce their size.

File structure

The file is in standard DOS EXE Format. The block of data that normally contains executable code instead contains a small decompression routine, followed by the compressed data.

The file is laid out in the following order:

  • Normal MZ .exe header
    • Extra space for the PKWare copyright header
    • Extra space for the original .exe header, minus the "MZ" signature
  • Decompression code in the normal .exe code block
    • Compressed data follows the decompression code, still in the normal .exe code block
    • After the compressed code there is a compressed relocation table using a different compression algorithm
    • After the compressed relocation table there is some trailing data of unknown purpose

The offsets for the various structures are as follows.

See EXE Format for the normal MZ header. The fields referred to in the following text are from the header described on that page.

Within the MZ header, the documented fields finish at offset 0x1C. PKLite adds some extra fields at this offset:

  • At offset 0x1C from the start of the .exe there is a UINT8 for the minor version number, followed by another UINT8 for the major version number. Only the lower 4 bits appear to be used, so bytes 01 02 mean PKLite version 2.1 and version bytes 05 21 mean PKLite version 1.5. The fifth (0x10) and sixth bits (0x20) are set in the major version number with 0x10 meaning "extra" compression (and PKLite will refuse to decompress it) and 0x20 meaning "large" compression (see below). Use majorVersion & 0x0F to extract the actual major version number.
  • Following this, at offset 0x1E there is a copyright message that continues until the start of the relocation table (offReloc from the .exe header).
  • At offReloc (again as a byte offset relative to the start of the .exe) the relocation table begins, technically numReloc * 4 bytes should be skipped to jump over the relocation table, however there are always zero entries in the relocation table. Thus "after" the relocation table (at the same offset) there is extra data added by PKLite. This data is the original .exe header, minus the "MZ" bytes. To restore the original .exe file, write out "MZ" followed by this extra data. It continues until the end of the PKLite .exe header (pgHeader * 16).
  • The relocation table should then be written to the .exe file but we don't have this yet as it is located after the compressed code.
  • After this, at offset pgHeader * 16 the decompression routine starts. By seeking a further 0x4E bytes into this code ! Confirm this offset is the same for all PKLite versions and reading a UINT8, the offset of the compressed data can be obtained. This value is in paragraphs (16-byte blocks) from the start of the memory address where the .exe is loaded, so some calculations are necessary to convert it to an offset in the .exe file. This is because the value is relative to address 0x100 as this is where all .exe files have their code loaded, but we need to change it to be relative to where the code begins in the .exe file itself, which is at offset pgHeader * 16. The calculation for this is offset_of_compressed_data = (pgHeader + byte_from_offset_0x4e - 0x10) * 16.
  • Once the compressed data has been decompressed (see example code below) and the 0xFF special code has been received indicating the end of the data, the compressed relocation table can be read. Simply perform the following in a loop:
    • Read a UINT8 as count. If it is zero, exit the loop as the process is complete.
    • Read a UINT16LE as msb.
    • Loop count times, doing:
      • Read a UINT16LE as lsb.
      • Write a UINT32LE with msb as the upper 16-bits and lsb as the lower 32-bits, i.e. (msb << 16) | lsb.
  • Once the relocation table has been written to the .exe (after the original header), the following data remains after the compressed relocation table, here referred to as the PKLite footer:
Data type Name Description
UINT16LE final_segSS Final relative value for SS
UINT16LE final_regSP Final value for SP
UINT16LE final_segCS Final relative value for CS
UINT16LE final_regIP Final IP to jump to

This is for loading into the registers normally initialised by DOS when an EXE file is loaded, such that once the decompression process is finished, it looks like everything was set up as normal. These values are repeated here even though they are identical to the values in the original header because the original header is not loaded into memory by DOS. These values are not required to rewrite the original .exe file.

  • After writing out the relocation table, 0x00 bytes must be inserted to pad the data up until the code begins. The code starts at the offset indicated by pgHeader * 16 so 0x00 bytes must be added until this offset is reached.
  • After the padding, the decompressed code can be written and the original .exe file has been restored.

Compression algorithm

The algorithm was originally documented as the OpenTESArena PKLite specification by Dozayon. It works for most files with "large" compression but there are still some edge cases it does not work for, notably those files that do not have the flag set for "large" compression.

Example source code

A number of utilities were released in the 1990s that could decompress PKLite .exe files, however these either patched and ran the x86 machine-code decompressor in the .exe file or did not come with source code, so they are of no help for documenting the decompression process here.

The following examples are the original source for the next generation of decompressors - those that are open source. Note that these only decompress the machine code block so that data from it can be extracted. They cannot write out a working decompressed .exe as that was not their use case:

The following example source code is able to decompress the whole file and produce a working decompressed .exe:

  • gamecompjs

Credits

The compression algorithm was reverse engineered and documented for OpenTESArena by Dozayon. Malvineous worked out how to find the start of the compressed data and how to write out the original .exe header and relocation table.