EXE Format
Format type | Executable / Script |
---|---|
Type | Compiled |
Platform | MS-DOS |
Code | 16-bit x86 |
Hidden data? | Yes |
Games | Too many to list |
The EXE format is used by all but the smallest DOS games to store the machine code instructions for running the game itself.
File format
Signature
The first two characters of the file are "MZ". The following values (see below) will indicate the size of the .exe file, however this should not be relied upon because it is not uncommon for additional data to be appended onto the end of the .exe. The other header fields can however be inspected to ensure the values are sensible, particularly the ones about memory requirements which will never ask for more than 640 kB of memory, otherwise the file would not run on the majority of DOS machines.
Header
Data type | Name | Description |
---|---|---|
char | signature[2] | "MZ" |
UINT16LE | lenFinal | Size of final block in the file, or 0 if the final block is a full 512 bytes. |
UINT16LE | numBlocks | Number of 512-byte blocks in the file, including the final block which may not be the full 512 bytes. |
UINT16LE | numReloc | Number of entries in the relocation table. |
UINT16LE | pgHeader | Number of 16-byte blocks (paragraphs) in the file header, including the "MZ" and up until the machine code starts. |
UINT16LE | pgMemExtra | Number of 16-byte paragraphs of extra memory that must be allocated for the program to run. |
UINT16LE | pgMemMax | If the program doesn't need all available memory, this value can limit it to a smaller number of paragraphs. |
INT16LE | segSS | Stack segment offset. This value is added to the load segment and set as the stack segment (SS register). |
UINT16LE | regSP | Initial value of the SP (stack pointer/index into stack) register. The stack is located at the segment:offset SS:SP. |
UINT16LE | checknum | Usually ignored and set to zero. Supposedly if all UINT16LE values in the file are summed they should equal zero, and this value can be tweaked to ensure that is the case. |
UINT16LE | regIP | Initial value of the IP (instruction pointer) register, where execution will actually begin. This is normally 0x100 as DOS writes some information (the PSP) between memory offsets 0 and 0x100. This means the first 0x100 bytes of the file are usually 0x00 bytes to avoid code getting overwritten, however to avoid wasting this space in the .exe files, some programs set regCS to -16 which will cause everything to be shifted back by 0x100 bytes. This means the code gets loaded at offset 0, then before it gets overwritten with the PSP regCS is applied, changing the segment and making the code visible at offset 0x100 instead. Thus the PSP can be written to memory before the code and execution begins at offset 0x100, without needing 0x100 bytes of zeroes at the start of the machine code. |
INT16LE | regCS | Offset for the CS (code segment) register relative to the segment the code was loaded in. Execution begins at CS:IP so a value of 1 would skip the first 16 bytes of code that follows the .exe header. May be negative. |
UINT16LE | offReloc | Offset of the first entry in the relocation table, relative to the start of the .exe file. Increasing this value allows more data to be stored in the .exe header. PKLite for example uses it to store a copyright message. |
UINT16LE | overlayIndex | Overlay number, or zero for the main .exe file. Rarely used. |
Additional data can appear in the header until offReloc is reached where the relocation table begins.
Relocation table
The relocation table consists of a number of four-byte entries. Beginning at offset offReloc from the start of the file and continuing until numReloc four-byte entries have been read.
Additional data can appear after the relocation table, until offset pgHeader * 16 where the machine code begins.
The relocation table lists all the instances in the machine code part of the executable that contain a segment address, either in the form of a pure segment address or as part of a far jump, far pointer or a far function call. These segment addresses are stored in the file as relative values (relative to the beginning of the machine code) and need to be adjusted ("relocated") to the absolute memory position into which the executable is loaded.
Each entry in the relocation table consists of two UINT16LE values. The first value is an offset and the second one is a paragraph address. The full address can be calculated from this as offset + paragraph * 16. This address itself is relative to the beginning of the machine code, so adding pgHeader * 16 to it would give the absolute position inside the .EXE file where the segment address that needs to be adjusted is stored. The relocation modifies that byte as well as the one following after it.
When modifying the code inside an exectuable file, it is vital to keep track of the bytes that need relocation, as the relocation process will modify those bytes and potentially break whatever code was inserted at that point. While it is possible to work around that limitation, the better alternative is to just add or remove relocation table entries as required.
It does not seem to matter how the full address is split into the offset and paragraph parts for the relocation table entries, as long as the offset is never set to 0xFFFF and offset + paragraph * 16 matches the intended address. Setting the paragraph address to address / 16 and the offset to address % 16 should work fine for any address.
Machine code
The actual machine code that runs begins at the end of the .exe header (file offset pgHeader * 16) and continues until the last (possibly partial) block has been read. This can be calculated like so:
lenCode = numBlocks * 512 if (lenFinal != 0) { // Last block is not 512 but less than this, so adjust it. lenCode = lenCode - 512 lenCode = lenCode + lenFinal }
After this point it is possible for more data to be stored in the .exe, but it will not be loaded into memory by DOS and is essentially ignored. Some games use this to store extra game data in their main .exe, opening their own .exe file at runtime to read out the extra data. These can often be spotted by the fact that the .exe file is much larger than one that will typically fit into memory.
To calculate the offset of this extra data, find the beginning of the code (pgHeader * 16) and add the size of the code (lenCode above). This will be the offset of the trailing data, or if there is none, it will match the size of the .exe file itself.
Compression
A few schemes exist to compress .exe files while still leaving them executable. These work by compressing the original code and inserting a small decompression routine at the start of the data. When DOS loads the .exe into memory, it ends up running the decompression routine instead of the original code. This routine expands the compressed data and then passes control to it so the program appears to work as normal. The trick with this process is figuring out how to decompress the data while using as little extra memory as possible, as most .exe files take up large amounts of memory and there is typically not enough memory to store both the compressed and decompressed code at the same time. So the decompressor must carefully expand the data such that it overwrites only the compressed data that has already been decompressed and is no longer needed.
Typically the .exe file headers are set up such that DOS will load both the decompression routine and the compressed code into memory at the same time, so the decompressor does not have to worry about opening and reading files.
The individual compression schemes are documented on their own pages, but they refer back to the .exe headers described here. There are many more schemes than listed here as this list only includes those used by games documented on this wiki.
Successors
The .exe format has been extended to add functionality and to work on other platforms:
- New Executable (NE) Format was introduced with 16-bit Windows.
- Linear Executable (LX/LE) Format adds 32-bit support and was used by many DOS games from the late 1990s running in protected mode (DOS4GW).
- Portable Executable (PE) Format was introduced with 32-bit Windows.
Most of these extensions are backwards compatible in that a small "MZ" stub is included that prints an error if the file is run on a platform that does not support the extensions.
Tools
The following tools are able to work with files in this format.
Name | Platform | Load? | Decompress? | Create? | Modify? | Compress? | Access hidden data? | Notes |
---|---|---|---|---|---|---|---|---|
MS-DOS debug.exe | MS-DOS | Yes | No | No | No | No | Yes | Loads into correct offsets, disassemble with "u" command. |
LZEXE | MS-DOS | No | No | No | No | Yes, LZEXE | No | |
PKLite | MS-DOS | No | Yes, PKLite | No | No | Yes, PKLite | No | |
UNLZEXE | MS-DOS | No | Yes, LZEXE | No | No | No | No | |
mz-explode | Portable | No | Yes, LZEXE, PKLite, KNOWLEDGE DYNAMICS, EXEPACK | No | No | No | No | C++ library |
Credits
The .exe structure was taken from the DJGPP documentation. If you find this information helpful in a project you're working on, please give credit where credit is due. (A link back to this wiki would be nice too!)