The CUR Format is used by Prehistorik to store data files. Some files it contains are compressed with LZSS compression while others aren't, and there is no indicator to tell other than by the filename extension.
There is no signature, but careful parsing of the FAT should reveal a number of things:
- The FAT length should be shorter than the archive file length
- The FAT length must be at least six bytes (the two bytes for the length field itself plus the four byte terminator)
- Adding up all the sizes of the files (including the length of the FAT) should exactly match the archive file length
- Making sure there are no control characters in the filenames
The file begins with a short header:
|UINT16LE||offStart||Offset of the first file|
The offStart value provides the offset of the first file, with the offset of subsequent files calculated by adding all their sizes on to the value. This value can also be used to work out how much data needs to be read to process the FAT.
As this field is limited to a 16-bit value, the FAT can be at most 65533 bytes long (allowing two bytes for the length field itself.) Given the variable length of filenames this means the maximum number of files varies. For example using 12 character (8.3) filenames, a maximum of 3,855 files can be stored, while reducing all filenames to one character allows 10,922 files to be stored.
A file entry consists of the filename and its size, and is repeated back-to-back until the terminator is reached.
|UINT32LE||lenFile||Size of the file data, in bytes|
|char[...]||filename||Filename, all uppercase, variable length|
|char||null||0x00 byte to terminate filename|
After the last file entry, there is a terminator consisting of four 0x00 bytes. If using the structure above, if the lenFile field is zero, finish reading the FAT and don't read the filename fields.
A file with a size of zero TODO: Has no effect, or terminates the FAT early?
The file's offset must be calculated from the offset of the previous file plus its size. The first file's data is at offStart from the header.
Note that the filenames are variable width with no apparent limit, which can make updating the archive file a bit of a challenge. In practice, the filenames are probably limited to some value not yet discovered - probably 12 chars since they are all 8.3 filenames.
Those files that are compressed start with a UINT32BE value storing the uncompressed size (in bytes), followed by the compressed data. The compression algorithm is LZSS with a length field of 2 bits, a distance field of 8 bits and a code field of 0=escape, 1=length/distance pair. The length-distance pair stores the length field first, followed by the distance field. Like the field storing the uncompressed size, all bits are decoded in big-endian order.
As far as is known, the .mat (except charset1.mat), .mdi and .pc1 files are always compressed, and all other files are uncompressed.
While unlikely to be encountered in practice, it is possible to store data after the terminating file entry (the one with a zero file size) and the start of the first file's data.
It is also theoretically possible to store data after the end of the last file, which would remain hidden as there would be no file entry pointing to it.
The following tools are able to work with files in this format.
|Name||Platform||Extract files?||Decompress on extract?||Create new?||Modify?||Compress on insert?||Access hidden data?||Edit metadata?||Notes|
This file format, including the compression algorithm, was reverse-engineered by Malvineous. If you find this information helpful in a project you're working on, please give credit where credit is due. (A link back to this wiki would be nice too!)