File format data types

From ModdingWiki
Jump to navigation Jump to search

This is a list of all the data types used in the file format descriptions on the wiki. They are loosely based on common C/C++ data types, and should be used throughout the wiki for consistency.

Type list

Numeric values

Data type Description
UINT8 Unsigned 8-bit integer
UINT16LE Unsigned 16-bit integer in little-endian format
UINT16BE Unsigned 16-bit integer in big-endian format
UINT32LE Unsigned 32-bit integer in little-endian format
UINT32BE Unsigned 32-bit integer in big-endian format

Signed equivalents are the same without the leading U, i.e. INT8, INT16LE, etc. Unless otherwise stated, the format is in two's complement (where a UINT8 value of 255 is -1 as an INT8, for example.)

Floating point values

Data type Description
Single 32-bit float (IEEE-754)
Real 48-bit float (Pascal specific)
Double 64-bit float (IEEE-754)

All floating point values are signed. They usually contain a sign bit, exponent, and mantissa.

Character strings

Characters and strings are all CP437 unless otherwise stated. On modern platforms, this means the bytes will need to be converted to the local character set (such as UTF-8) in order for the glyphs to match what was originally intended.

Data type Description
char[x] String x bytes/characters long, may or may not be null terminated (each use should indicate which)
char Single 8-bit character
ASCIIZ A C-style string (variable-length, terminated with a single NULL/0x00 value)

Misc data types

Data type Description
BYTE Same as UINT8 but conceptually for generic data rather than numeric values (e.g. UINT8 would be used for a number, while a BYTE would be used for a bitfield)
BYTE[x] Block of data x bytes long

Big endian vs little endian

For numeric values larger than a single byte, the endianness specifies how the values are split over multiple bytes. For example a hex value of 0x1234AABB when written to a file will take up two bytes, as follows:

Endian Bytes in file
Big 12 34 AA BB
Little BB AA 34 12

For those languages that allow direct memory access such as C/C++, converting an integer value to a byte array will reveal the value stored in-memory in the same order as the table above.

Normally when reading or writing a variable to a file a programmer will simply pass the memory address of the variable, resulting in the file mirroring the byte order in memory. This is no problem when reading the variable back in on the same system, as the byte order will match. However when reading data from a different system (for example using an Intel PC to read files from a PowerPC Mac) the byte order will be opposite to what the system expects and the programmer must convert the values manually.

Conversion examples

If a value is being read on the same system (little to little or big to big) then no action is required. If the systems are different, then the values must be swapped. The following sections list examples for different programming languages.

C/C++

// 16-bit
int in = 0x1234;
int out = ((in & 0xFF) << 8) | (in >> 8);
// out should now be 0x3412

// 32-bit
int in = 0x1234AABB;
int out =
  ((in & 0xFF) << 24) |
  ((in & 0xFF00) << 8) |
  ((in & 0xFF0000) >> 8) |
  (in >> 24);
// out should now be 0xBBAA3412

C# .NET

Int16 ByteSwap16(Int16 inValue)
{
  return (Int16)(
    ((inValue & 0xFF) << 8) |
    ((inValue >> 8) & 0xFF)
    );
}

Int32 ByteSwap32(Int32 inValue)
{
  return (Int32)(
    ((inValue & 0xFF) << 24) |
    ((inValue & 0xFF00) << 8) |
    ((inValue & 0xFF0000) >> 8) |
    ((inValue >> 24) & 0xFF)
    );
}

Visual Basic .NET

Function ByteSwap16(ByVal InValue as Int16) as Int16
  Return ((InValue And &HFF) << 8) Or ((InValue >> 8) And &HFF)
End Function

Function ByteSwap32(ByVal InValue as Int32) as Int32
  Return ((InValue And &HFF) << 24) Or _
         ((InValue And &HFF00) << 8) Or _
         ((InValue And &HFF0000) >> 8) Or _
         ((InValue >> 24) And &HFF)
End Function

Note: endianness of the platform in .NET can be known using the value of System.BitConverter.IsLittleEndian.