Huffman Compression

Format type	Compression algorithm
Type	Stream
I/O unit size	1-byte in, 1-bit out
Games	The Blues Brothers Catacomb 3-D Commander Keen Dreams Dangerous Dave 2

Huffman Compression is a compression algorithm used in many classic games. It is about as efficient as LZW or RLE compression, with many games using the three formats simultaneously.

Huffman compression involves changing how various characters are stored. Normally all characters in a given segment of data are equal and take an equal amount of space to store. However by making more common characters take up less space while allowing less commonly used characters to take up more the overall size of a segment of data can be reduced. Any Huffman compressed data will thus be associated with a dictionary file (internal or external) used to decompress it. This page will go through each aspect of Huffman compression in a stepwise manner before presenting some examples of usage and source code.

Frequency tables

To illustrate the various aspects of Huffman compression the example text "She sells seashells by the sea shore." will be Huffman compressed and the processes and outputs shown in various sections on this page.

In order to compress data using Huffman compression the compressor must first build a 'frequency table' of all the elements it wishes to compress. In most cases this will be the 256 values a single byte can take, though it need not be limited to this. (Due to the binary nature of the compression method, sets of elements that match powers of 2 are both easiest and most efficient to compress.) The data is scanned and each occurrence of a given value. Taking the letters in our example sentence we can construct the following frequency table:

 Char   Freq
 's'      7
 'e'      7
 ' '      6
 'h'      4
 'l'      4
 'a'      2
 'S'      1
 'b'      1
 'o'      1
 'r'      1
 't'      1
 'y'      1
 '.'      1

The Binary Tree

The binary tree is core to how Huffman compression compresses data. This is done by constructing a 'binary tree', so named because of its branching structure. Each branching point or 'node' has two options, 'left' and 'right' which lead either to another node or a character. The tree is constructed so that every character can be placed upon it and every path ends in a character.

This is done by first building a point of origin, 'root node' and the adding further nodes from there. Using our example we can build the following binary tree simply by listing the nodes alphabetically (and using '_' to stand in for the space for clarity):

           [Root]
         ____|_______
       /             \
     __|__            |
    /      \        /   \
    |       |       |    .
  /   \   /   \   /   \
  |   |   |   |   |   |
 / \ / \ / \ / \ / \ / \
 _ S a b e h l o r s t y

On this tree we can use 'L' to specify a left path and 'R' to specify a right. As such the letter 'a' is represented by 'LLRL' and 'y' by 'RLRR'. Note that since the number of different characters is not a power of two (E.g. 16) not all paths from the root to a character are the same length, one is considerably shorter. The '.' character is represented by simply 'RR'

Assuming that 'R' and 'L' are '0' and '1' each 8-bit character in the original text can be represented by just 4 bits on average, halving the message length. This is far from optimal however and there is a specific procedure used to construct an optimal tree for the shortest message length. To do this advantage is taken of the fact that paths can be different lengths. If one set of paths is made a bit shorter at the expense of making a smaller set of paths a bit longer then the overall message will be made shorter.

Constructing an optimal tree

The first step in creating an optimal tree again is to create a 'root node'. To populate its two options the compressor repeatedly divides the frequency table. This is done by starting with the most common character and seeing if it contains more occurrences than the rest of the table combined. If not then the second most common character is added and so on.

In our example there are a total of 37 characters. Three characters make up more than half of this, 's', 'e' and ' '. Because of the binary structure of the tree however four options are needed here so 'h' is also included. All of these characters are placed on the left option of the root node while the remaining characters are placed on the right.

The next step creates a new node with left and right options and repeats the process. The two most common characters of the four, 's' and 'e' go to the left branch of this new node. A third node is created and, now that there are only two options, each is given a branch. The tree now looks like this:

  [Root]
    / \
    |  l, a, etc
   / \
   | _, h
 /  \
 s  e

Now the character 's' can be represented as LLL and 'e' as LLR.

The compressor now moves 'up' to the previous node which still has a branch with more than one option in it. Another node is created and, since there are only two options, one is assigned to each branch. Te compressor then returns to the root node to deal with the right half of the tree. When it has assigned very character to a node the following optimal tree is produced:

      ___[Root]____
     /             \
    _|_         ___|_______
   /   \       /           \
   |   |       |            |
 /  \ / \     / \          / \
 s  e _ h    l   |        |   |
                / \      / \ / \
               a  |      o r t |
                 / \          / \
                 S b         y  .

Notice that 4 characters now have paths that are just 3 choices long while 4 have paths 4 choices long and 4 have paths 5 choices long. Overall the average path length is the same as our previous tree. However 24 occurrences are one choice shorter, 5 are identical, 3 are one choice longer and '.' is three choices longer. This gives us a compressed message that is a total of 18 choices (or bits) shorter.

Labeling the tree

Now that a tree has been constructed it needs to be labelled in some form so that it can be stored as data either in the decompressor or to be sent with the compressed code. There are two major issues, what system to use when assigning numbers to the various nodes, and what bit value, 0 or 1, to assign to left or right options.

The most common (but far from unique) system is to label nodes from the 'bottom' of the tree to the top, going left-to-right across each 'layer' of the tree. (In our example the node that leads to 'S' and 'b' would thus be the first numbered node.) Nodes are usually numbered starting with 0 and ending with the root node having the highest number. Our tree would be numbered as follows using this system:

      ____[11]_____
     /             \
    [9]         __[10]_____
   /   \       /           \
  [5] [6]     [7]          [8]
 /  \ / \     / \          / \
 s  e _ h    l  [2]      [3] [4]
                / \      / \ / \
               a [0]     o r t[1]
                 / \          / \
                 S b         y  .

Usually the left branch is given a bit value of 0 and the right branch a bit value of 1. Using this scheme 's' in our example resides on node 1 and is represented by the bit string '000' The complete compressed message in bit form is now 10110 011 001 010 000 001 100 100 000 010 000 001 1010 000 011 001 100 100 000 010 10111 11110 010 1110 011 001 010 000 001 1010 010 000 011 1100 1101 001 11111 or in bytes $B3 $28 $19 $02 $06 $83 $32 $05 $7F $2E $65 $03 $48 $3C $D3 $F0 (The final byte is padded with nul bits.) This is a total of 16 bytes used for a 37 byte message.

Standard tree size

The most common implementation of Huffman compression is '1 byte in, 1 bit out'. This involves compressing one byte of data at a time and converting it into a bitstream. In this case the compressor expects to find 256 different possible values of data. This requires a tree with 255 nodes, including the root, and an average path length of 8 choices.

This is not the only possibility however and Huffman can compress data in words or dwords or in more exotic manners.

Commander Keen Dreams

for example combines Huffman with LZW compression, building a binary tree of a fixed size and filling it with the most recently encountered strings of bits.

The dictionary

Often the binary tree must be stored in some form, usually called a dictionary. This will either be fixed and known by both the compressor and decompressor. (Which minimizes individual message sizes but can limit compression efficiency overall.) or tailored for each message and sent with said message.

Dictionary format is quite standard. It is merely a numerical list of nodes (however they have been labelled.) starting with node 0 and ending with the root. Each node usually taking up four bytes, two bytes per branch. Each branch will have one byte dedicated to identifying if it leads to another node or a character and the other byte identifying the node or character. Usually '0' identifies the option asleading to a character and '1' to another node, but again this can be reversed.

Using this system, our example tree could be stored as either of the two following:

   0S0b 0y0. 0a10 0o0r 0t11 0s0e 0_0h 0l12 1314 1516 1718 19110

   $00 $53 $00 $62 $00 $79 $00 $2E $00 $6F $00 $72 $00 $74 $01 $01 $00 $73 $00 $65 $00 $20 $00 $68...
    -   S   -   b   -   y   -   .   -   o   -   r   -   t   |  n1   -   s    -  e   -   _   -   h ...
        (node 0)         (node 1)       (node 3)        (node 4)        (node 5)      (node 6)    ...

The dictionary is thus usually 1020 bytes long, containing 255 nodes of 4 bytes each, though again this can vary depending on how readable the dictionary is to humans and whether or not the dictionary itself is compressed.

Huffman implementation in ID Software games

Games created by Id Software have a shared but distinct implementation of Huffman compression.

In older games such as

Dangerous Dave 2

the dictionary and data are included together in one file. First is a four-byte header,, 'HUFF' followed by four bytes that give the decompressed size of the data (for memory purposes.). Following that is the dictionary, 1020 bytes in length then the compressed data.

In newer games (such as Commander Keen 4-6 or Wolfenstein 3-D) the data is stored in an external file which consists of a number of concatenated 'chunks', each chunk beginning with a dword giving its decompressed size (again for memory purposes.). These chunks are addressed by an internal or external 'header' file consisting of a series of three (or sometimes four) byte offsets to the start of various chunks in the data file. (The end of the header is easily found as the last entry is the length of the data file.) The dictionary is 1024 bytes long and stored in the executable.

The dictionary consists of 255 nodes (in newr games plus four null bytes of padding) and there are two or three dictionaries stored in the executable depending on the game. The end of these dictionaries can be found by looking for node 255 (number 254!) which universally consists of the string $00 $00 $FD $01 due to the fact that the byte $00 is the single most common character in game data and thus always occupies the left branch of the root node while the other branch must go to the next lowest node, number 253.

An important but subtle difference is how data is read. While bytes are read sequentially (1, 2, 3...) bits within each byte are read in reverse order (8, 7, 6...). (Technically each byte is little endian.)

Trivial ID dictionary

The following is a complete 'trivial' dictionary. This dictionary is constructed such that data compressed with it is not altered in any way, the output and input are the same. When used to construct a binary tree all paths are of equal length, 8 choices or bits. And the nodes are arranged in a logical order with an easily seen pattern; the first half of the tree consists of terminal nodes, the second half of branch nodes. (Of the second half the first half of that consists of branch nodes that lead to terminal nodes while the second half consists of branch nodes to two branch nodes and so on.)

$00 $00 $80 $00 $40 $00 $C0 $00 $20 $00 $A0 $00 $60 $00 $E0 $00
$10 $00 $90 $00 $50 $00 $D0 $00 $30 $00 $B0 $00 $70 $00 $F0 $00
$08 $00 $88 $00 $48 $00 $C8 $00 $28 $00 $A8 $00 $68 $00 $E8 $00
$18 $00 $98 $00 $58 $00 $D8 $00 $38 $00 $B8 $00 $78 $00 $F8 $00
$04 $00 $84 $00 $44 $00 $C4 $00 $24 $00 $A4 $00 $64 $00 $E4 $00
$14 $00 $94 $00 $54 $00 $D4 $00 $34 $00 $B4 $00 $74 $00 $F4 $00
$0C $00 $8C $00 $4C $00 $CC $00 $2C $00 $AC $00 $6C $00 $EC $00
$1C $00 $9C $00 $5C $00 $DC $00 $3C $00 $BC $00 $7C $00 $FC $00
$02 $00 $82 $00 $42 $00 $C2 $00 $22 $00 $A2 $00 $62 $00 $E2 $00
$12 $00 $92 $00 $52 $00 $D2 $00 $32 $00 $B2 $00 $72 $00 $F2 $00
$0A $00 $8A $00 $4A $00 $CA $00 $2A $00 $AA $00 $6A $00 $EA $00
$1A $00 $9A $00 $5A $00 $DA $00 $3A $00 $BA $00 $7A $00 $FA $00
$06 $00 $86 $00 $46 $00 $C6 $00 $26 $00 $A6 $00 $66 $00 $E6 $00
$16 $00 $96 $00 $56 $00 $D6 $00 $36 $00 $B6 $00 $76 $00 $F6 $00
$0E $00 $8E $00 $4E $00 $CE $00 $2E $00 $AE $00 $6E $00 $EE $00
$1E $00 $9E $00 $5E $00 $DE $00 $3E $00 $BE $00 $7E $00 $FE $00
$01 $00 $81 $00 $41 $00 $C1 $00 $21 $00 $A1 $00 $61 $00 $E1 $00
$11 $00 $91 $00 $51 $00 $D1 $00 $31 $00 $B1 $00 $71 $00 $F1 $00
$09 $00 $89 $00 $49 $00 $C9 $00 $29 $00 $A9 $00 $69 $00 $E9 $00
$19 $00 $99 $00 $59 $00 $D9 $00 $39 $00 $B9 $00 $79 $00 $F9 $00
$05 $00 $85 $00 $45 $00 $C5 $00 $25 $00 $A5 $00 $65 $00 $E5 $00
$15 $00 $95 $00 $55 $00 $D5 $00 $35 $00 $B5 $00 $75 $00 $F5 $00
$0D $00 $8D $00 $4D $00 $CD $00 $2D $00 $AD $00 $6D $00 $ED $00
$1D $00 $9D $00 $5D $00 $DD $00 $3D $00 $BD $00 $7D $00 $FD $00
$03 $00 $83 $00 $43 $00 $C3 $00 $23 $00 $A3 $00 $63 $00 $E3 $00
$13 $00 $93 $00 $53 $00 $D3 $00 $33 $00 $B3 $00 $73 $00 $F3 $00
$0B $00 $8B $00 $4B $00 $CB $00 $2B $00 $AB $00 $6B $00 $EB $00
$1B $00 $9B $00 $5B $00 $DB $00 $3B $00 $BB $00 $7B $00 $FB $00
$07 $00 $87 $00 $47 $00 $C7 $00 $27 $00 $A7 $00 $67 $00 $E7 $00
$17 $00 $97 $00 $57 $00 $D7 $00 $37 $00 $B7 $00 $77 $00 $F7 $00
$0F $00 $8F $00 $4F $00 $CF $00 $2F $00 $AF $00 $6F $00 $EF $00
$1F $00 $9F $00 $5F $00 $DF $00 $3F $00 $BF $00 $7F $00 $FF $00
$00 $01 $01 $01 $02 $01 $03 $01 $04 $01 $05 $01 $06 $01 $07 $01
$08 $01 $09 $01 $0A $01 $0B $01 $0C $01 $0D $01 $0E $01 $0F $01
$10 $01 $11 $01 $12 $01 $13 $01 $14 $01 $15 $01 $16 $01 $17 $01
$18 $01 $19 $01 $1A $01 $1B $01 $1C $01 $1D $01 $1E $01 $1F $01
$20 $01 $21 $01 $22 $01 $23 $01 $24 $01 $25 $01 $26 $01 $27 $01
$28 $01 $29 $01 $2A $01 $2B $01 $2C $01 $2D $01 $2E $01 $2F $01
$30 $01 $31 $01 $32 $01 $33 $01 $34 $01 $35 $01 $36 $01 $37 $01
$38 $01 $39 $01 $3A $01 $3B $01 $3C $01 $3D $01 $3E $01 $3F $01
$40 $01 $41 $01 $42 $01 $43 $01 $44 $01 $45 $01 $46 $01 $47 $01
$48 $01 $49 $01 $4A $01 $4B $01 $4C $01 $4D $01 $4E $01 $4F $01
$50 $01 $51 $01 $52 $01 $53 $01 $54 $01 $55 $01 $56 $01 $57 $01
$58 $01 $59 $01 $5A $01 $5B $01 $5C $01 $5D $01 $5E $01 $5F $01
$60 $01 $61 $01 $62 $01 $63 $01 $64 $01 $65 $01 $66 $01 $67 $01
$68 $01 $69 $01 $6A $01 $6B $01 $6C $01 $6D $01 $6E $01 $6F $01
$70 $01 $71 $01 $72 $01 $73 $01 $74 $01 $75 $01 $76 $01 $77 $01
$78 $01 $79 $01 $7A $01 $7B $01 $7C $01 $7D $01 $7E $01 $7F $01
$80 $01 $81 $01 $82 $01 $83 $01 $84 $01 $85 $01 $86 $01 $87 $01
$88 $01 $89 $01 $8A $01 $8B $01 $8C $01 $8D $01 $8E $01 $8F $01
$90 $01 $91 $01 $92 $01 $93 $01 $94 $01 $95 $01 $96 $01 $97 $01
$98 $01 $99 $01 $9A $01 $9B $01 $9C $01 $9D $01 $9E $01 $9F $01
$A0 $01 $A1 $01 $A2 $01 $A3 $01 $A4 $01 $A5 $01 $A6 $01 $A7 $01
$A8 $01 $A9 $01 $AA $01 $AB $01 $AC $01 $AD $01 $AE $01 $AF $01
$B0 $01 $B1 $01 $B2 $01 $B3 $01 $B4 $01 $B5 $01 $B6 $01 $B7 $01
$B8 $01 $B9 $01 $BA $01 $BB $01 $BC $01 $BD $01 $BE $01 $BF $01
$C0 $01 $C1 $01 $C2 $01 $C3 $01 $C4 $01 $C5 $01 $C6 $01 $C7 $01
$C8 $01 $C9 $01 $CA $01 $CB $01 $CC $01 $CD $01 $CE $01 $CF $01
$D0 $01 $D1 $01 $D2 $01 $D3 $01 $D4 $01 $D5 $01 $D6 $01 $D7 $01
$D8 $01 $D9 $01 $DA $01 $DB $01 $DC $01 $DD $01 $DE $01 $DF $01
$E0 $01 $E1 $01 $E2 $01 $E3 $01 $E4 $01 $E5 $01 $E6 $01 $E7 $01
$E8 $01 $E9 $01 $EA $01 $EB $01 $EC $01 $ED $01 $EE $01 $EF $01
$F0 $01 $F1 $01 $F2 $01 $F3 $01 $F4 $01 $F5 $01 $F6 $01 $F7 $01
$F8 $01 $F9 $01 $FA $01 $FB $01 $FC $01 $FD $01

Using this example we can see the unusual manner in which ID games read bits within bytes. As an example consider the character $80; in binary this is '10000000' However, following this path using the dictionary given above will output the character $01 (00000001). Instead the bits must be read in reverse order. Doing so will lead to the following nodes starting at the root node: 254(root) -> 252 -> 248 -> 240 -> 224 -> 192 -> 128 -> 0(terminal node for characters $00 and $80)

Source code

Some example code is available in various languages showing how to decompress (and in some cases compress) files using the Huffman algorithm.

QuickBasic

'
' DANGEROUS DAVE 2 - IN THE HAUNTED MANSION - Huffman Decompressor
'  - by Napalm with thanks to Adurdin's work on ModKeen
'
' This source is Public Domain, please credit me if you use it.
'
'
DECLARE SUB HUFFDECOMPRESS (INNAME AS STRING, OUTNAME AS STRING)

TYPE NODE
        BIT0 AS INTEGER
        BIT1 AS INTEGER
END TYPE

' Input filename, output filename
HUFFDECOMPRESS "TITLE1.DD2", "TITLE1.PIC"


SUB HUFFDECOMPRESS (INNAME AS STRING, OUTNAME AS STRING) ' by Napalm
	DIM INFILE AS INTEGER, OUTFILE AS INTEGER, I AS INTEGER
	DIM SIG AS LONG, OUTLEN AS LONG, BITMASK AS INTEGER
	DIM CURNODE AS INTEGER, NEXTNODE AS INTEGER
	DIM CHRIN AS STRING * 1, CHROUT AS STRING * 1
	DIM NODES(0 TO 254) AS NODE

	' Open input file
	INFILE = FREEFILE
	OPEN INNAME FOR BINARY ACCESS READ AS INFILE
       
	' Check file signature
	GET INFILE, , SIG
	IF SIG <> &H46465548 THEN ' Hex for: HUFF in little endian
		PRINT "INVALID FILE!"
		EXIT SUB
	END IF
       
	' Get output length
	OUTLEN = 0
	GET INFILE, , OUTLEN
       
	' Read in the Huffman binary tree
	FOR I = 0 TO 254
		GET INFILE, , NODES(I).BIT0
		GET INFILE, , NODES(I).BIT1
	NEXT I

	' Open output file
	OUTFILE = FREEFILE
	OPEN OUTNAME FOR BINARY ACCESS WRITE AS OUTFILE

	' Decompress input data using binary tree
	CURNODE = 254
	DO
		BITMASK = 0
		GET INFILE, , CHRIN
		DO
			' Decide which node to travel down depending on
			'   input bits from CHRIN.
			IF ASC(CHRIN) AND 2 ^ BITMASK THEN
				NEXTNODE = NODES(CURNODE).BIT1
			ELSE
				NEXTNODE = NODES(CURNODE).BIT0
			END IF
		       
			' Is this next node another part of the tree or
			'   is it a end node? Less than 256 mean end node.
			IF NEXTNODE < 256 THEN
			       
				' Get output char from end node and save.
				CHROUT = CHR$(NEXTNODE AND &HFF)
				PUT OUTFILE, , CHROUT
			       
				' Amend output length and start from top of
				'   binary tree.
				OUTLEN = OUTLEN - 1
				CURNODE = 254

			ELSE
				' Travel to next node
				CURNODE = (NEXTNODE AND &HFF)

			END IF

			' Move to next input bit
			BITMASK = BITMASK + 1
		LOOP WHILE BITMASK < 8 AND OUTLEN > 0
		' Loop while we still need to output data
	LOOP WHILE OUTLEN > 0

	' Clean up
	CLOSE OUTFILE
	CLOSE INFILE

END SUB

SUB MAKHUF 'Mak a degenerate huffman tree, store as string huffq
	OPEN "HUFF.DD2" FOR BINARY AS #8
	aq = "HUFF"
	PUT #8, 1, aq
	x = 9
	FOR t = 0 TO 255
		b = t
		va = 0
		vb = 0
		vc = 0
		vd = 0
		ve = 0
		vf = 0
		vg = 0
		vh = 0
		IF b > 127 THEN LET va = va + 1
		b = b MOD 128
		IF b > 63 THEN LET vb = vb + 1
		b = b MOD 64
		IF b > 31 THEN LET vc = vc + 1
		b = b MOD 32
		IF b > 15 THEN LET vd = vd + 1
		b = b MOD 16
		IF b > 7 THEN LET ve = ve + 1
		b = b MOD 8
		IF b > 3 THEN LET vf = vf + 1
		b = b MOD 4
		IF b > 1 THEN LET vg = vg + 1
		b = b MOD 2
		IF b = 1 THEN LET vh = vh + 1
		b = (vh * 128) + (vg * 64) + (vf * 32) + (16 * ve) + (8 * vd) + (4 * vc) + (2 * vb) + va
		aq = MKI$(b)
		PUT #8, x, aq
		x = x + 2
	NEXT t
	FOR t = 0 TO 253
		aq = MKI$(t + 256)
		PUT #8, x, aq
		x = x + 2
	NEXT t
	GET #8, 1, huffq
	CLOSE #8
	KILL "HUFF.DD2"
END SUB

Visual Basic .NET

Huffman tree representation

This class, BinaryTreeNode, represents a binary tree whose branch nodes carry no value, like a Huffman dictionary tree.

Public Class BinaryTreeNode(Of T)
    ' By Fleexy, public domain, credit where credit is due
    Private Branch As Boolean
    Private Children As BinaryTreeNode(Of T)()
    Private HoldValue As T
    Public Sub New(LeafValue As T)
        Branch = False
        HoldValue = LeafValue
    End Sub
    Public Sub New(LeftChild As BinaryTreeNode(Of T), RightChild As BinaryTreeNode(Of T))
        Branch = True
        Children = {LeftChild, RightChild}
    End Sub
    Public ReadOnly Property HasValue As Boolean
        Get
            Return Not Branch
        End Get
    End Property
    Public Property Value As T
        Get
            If Branch Then Throw New InvalidOperationException
            Return HoldValue
        End Get
        Set(value As T)
            If Branch Then Throw New InvalidOperationException
            HoldValue = value
        End Set
    End Property
    Public Property Child(Side As ChildSide) As BinaryTreeNode(Of T)
        Get
            If Not Branch Then Throw New InvalidOperationException
            Return Children(Side)
        End Get
        Set(value As BinaryTreeNode(Of T))
            If Not Branch Then Throw New InvalidOperationException
            Children(Side) = value
        End Set
    End Property
    Public Enum ChildSide As Byte
        Left = 0
        Right = 1
    End Enum
    Public ReadOnly Property Count As Integer
        Get
            If Not Branch Then Return 1
            Return Children(0).Count + Children(1).Count
        End Get
    End Property
    Public ReadOnly Property Depth As Integer
        Get
            If Not Branch Then Return 1
            Return Math.Max(Children(0).Depth, Children(1).Depth) + 1
        End Get
    End Property
    Public Overrides Function ToString() As String
        Return "{Count = " & Count & ", Depth = " & Depth & "}"
    End Function
End Class

Huffman tree reading

This piece of code will read in a stored Huffman dictionary in the format described at the top of this article and store it in a BinaryTreeNode(Of Byte) as shown above.

        ' By Fleexy, public domain, credit where credit is due
        Dim fDict As New IO.FileStream(DictionaryFile, IO.FileMode.Open)
        Dim raw(254) As Tuple(Of UShort, UShort)
        For x = 0 To 254
            raw(x) = Tuple.Create(ReadUShort(fDict), ReadUShort(fDict))
        Next
        fDict.Close()
        Dim GenerateTree As Func(Of UShort, BinaryTreeNode(Of Byte))
        GenerateTree = Function(NextNode As UShort) As BinaryTreeNode(Of Byte)
                           Dim n As Tuple(Of UShort, UShort) = raw(NextNode)
                           Dim a, b As BinaryTreeNode(Of Byte)
                           If n.Item1 < 256 Then
                               a = New BinaryTreeNode(Of Byte)(n.Item1)
                           Else
                               a = GenerateTree(n.Item1 - 256)
                           End If
                           If n.Item2 < 256 Then
                               b = New BinaryTreeNode(Of Byte)(n.Item2)
                           Else
                               b = GenerateTree(n.Item2 - 256)
                           End If
                           Return New BinaryTreeNode(Of Byte)(a, b)
                       End Function
        Dim dict As BinaryTreeNode(Of Byte) = GenerateTree(254)
        fDict.Close()

Haskell

Currently, you have to manually extract chunks using the offsets in AUDIOHED.***.

-- Reading in a Huffman Tree, and using
-- it to decompress individual chunks.
-- Code by QuirkyM, format information
-- taken from http://www.shikadi.net/moddingwiki/Huffman_Compression

-- This code is public domain, but please credit me
-- if you use it.

import Data.Maybe
import Data.Word  -- Fixed-size unsigned integers
import Data.Bits
import Data.List
import Data.Int   -- Fixed-size signed integers

import qualified Data.ByteString as BS

import Data.ByteString (ByteString)

---------------------------------------------------------------------
-- Huffman Tree data structure and related functions

data HuffTree a = HuffNode (HuffTree a) (HuffTree a) | HuffLeaf a deriving (Show,Eq)

-- This just counts the number of output symbols.
huffSize2 :: (HuffTree a) -> Int
huffSize2 (HuffNode l r) = (huffSize2 l) + (huffSize2 r)
huffSize2 (HuffLeaf _)   = 1

---------------------------------------------------------------------
-- Helper Functions

-- Find the first element that satisfies a
-- predicate after mapping.
findFirst :: (a -> b) -> (b -> Bool) -> [a] -> Maybe b
findFirst f p (x:xs)
    | (p $ f x) = Just $ f x
    | otherwise = findFirst f p xs
findFirst f p [] = Nothing

-- Find the first element that satisfies a
-- predicate after mapping, throwing an error
-- if no such element is found.
findFirst' :: String -> (a -> b) -> (b -> Bool) -> [a] -> b
findFirst' err f p xs = fromMaybe (error err) (findFirst f p xs)

---------------------------------------------------------------------
-- Reading Huffman Dictionary

-- Read in a Huffman Tree
-- (If node 254 is not the root, try creating all
--  the trees until you find one with all 256 bytes)
readHuffTree :: ByteString -> HuffTree Word8
readHuffTree bs
    | (huffSize2 hf1 == 256) = hf1
    | otherwise              = findFirst' ("readHuffTree: No Full Huffman Tree found.") (readHuffTree' bs) (\hf -> 256 == huffSize2 hf) [0..253]
    where hf1 = readHuffTree' bs 254

-- Read a Huffman Tree starting from a specific node.
readHuffTree' :: ByteString -> Int -> HuffTree Word8
readHuffTree' bs ind
    | x1 <- BS.index bs (4 * ind)
    , x2 <- BS.index bs (1 + (4 * ind))
    , y1 <- BS.index bs (2 + (4 * ind))
    , y2 <- BS.index bs (3 + (4 * ind))
    = HuffNode (huffChoice x1 x2 bs) (huffChoice y1 y2 bs)

-- Pattern match on one side of a node
huffChoice :: Word8 -> Word8 -> ByteString -> HuffTree Word8
huffChoice x 0x01 bs = readHuffTree' bs (fromIntegral x)
huffChoice c 0x00 bs = HuffLeaf c
huffChoice _ _    _  = error "readHuffTree: Encountered a non-00/01 option."

----------------------------------------------------------
-- Reading Huffman-Compressed Data

-- Take four bytes, and convert them into an
-- unsigned 32-bit integer.
word32le :: (Word8,Word8,Word8,Word8) -> Word32
word32le (x1,x2,x3,x4) = (fi x1) .|. (fi x2 `shiftL` 8) .|. (fi x3 `shiftL` 16) .|. (fi x4 `shiftL` 24)
    where fi = fromIntegral :: Word8 -> Word32

-- Lazy construction/interpretation of Bool List.
-- (Big-endian bits)
readBitsBE :: Word8 -> [Bool] -> [Bool]
readBitsBE x xs = ((rd 7):(rd 6):(rd 5):(rd 4):(rd 3):(rd 2):(rd 1):(rd 0):xs)
    where rd i = testBit x i

-- Lazy construction/interpretation of Bool List.
-- (Little-endian bits)
readBitsLE :: Word8 -> [Bool] -> [Bool]
readBitsLE x xs = ((rd 0):(rd 1):(rd 2):(rd 3):(rd 4):(rd 5):(rd 6):(rd 7):xs)
    where rd i = testBit x i

-- This is probably a bad idea...
-- ...unless the laziness saves some memory...
convertBitsLE :: ByteString -> [Bool]
convertBitsLE bs
    | (Just (x,xs)) <- BS.uncons bs
    = readBitsLE x (convertBitsLE xs)
    | otherwise = []

-- The actual function to read the data
huffRead :: (HuffTree a) -> BS.ByteString -> [a]
huffRead hf bs 
    | (Just (x1,bs1)) <- BS.uncons bs
    , (Just (x2,bs2)) <- BS.uncons bs1
    , (Just (x3,bs3)) <- BS.uncons bs2
    , (Just (x4,bs4)) <- BS.uncons bs3
    = huffRead' 0 (word32le (x1,x2,x3,x4)) hf hf (convertBitsLE bs4)
    | otherwise = error "huffRead: Less than 4 bytes in chunk."

-- Usage:
-- huffRead' <# of bytes output> <total # of bytes> <current position in HuffTree> <full HuffTree> <input bit stream>
-- More Details:
-- <# of bytes output>            : The number of bytes output from reading the input stream.
--                                  When this reaches the total # of bytes, the original data
--                                  has all been extracted
-- <total # of bytes>             : The total number of bytes
-- <current position in HuffTree> : A sub-Huffman Tree, representing the current position
--                                  e.g. If you start with the full tree, and then read a 1,
--                                  you'll get the right subtree.
-- <full HuffTree>                : The full Huffman Tree.
-- <input bit stream>             : A list of Bools, converted from a bytestring.
huffRead' :: Word32 -> Word32 -> (HuffTree a) -> (HuffTree a) -> [Bool] -> [a]
huffRead' cntr mx           _   _ []
    | (cntr == mx) = []
    | otherwise    = error "huffRead: Ran out of bits to read."
huffRead' cntr mx (HuffLeaf x) hf bls = (x:(huffRead' (cntr+1) mx hf hf bls))
huffRead' cntr mx (HuffNode l r) hf (bl:bls)
    | (cntr  == mx) = [] -- end if the number of output bytes has been reached.
    | (bl ==  True) = huffRead' cntr mx r hf bls
    | (bl == False) = huffRead' cntr mx l hf bls
    | otherwise     = error "huffRead: Apparently the law of the excluded middle doesn't hold."

---------------------------------------------------------------------
-- Example IO Operation

-- Take a compressed chunk, and decode it.
huffDecode :: FilePath -> FilePath -> FilePath -> IO ()
huffDecode dct inf outf = do
    { hf <- readHuffTree <$> BS.readFile dct
    ; bs <- BS.readFile inf
    ; op <- return $ BS.pack $ huffRead hf bs
    ; BS.writeFile outf op
    }

-- Take a list of compressed chunks, and decode it.
huffDecodes :: FilePath -> [(FilePath,FilePath)] -> IO ()
huffDecodes dct fps = do
    { hf <- readHuffTree <$> BS.readFile dct
    ; mapM_ (\(inf,outf) -> do {op <- BS.pack <$> huffRead hf <$> BS.readFile inf ; BS.writeFile outf op}) fps
    }

-- Example Usage:
-- huffDecodes "AUDIODCT.CK5" [("song_0001.imf","extr_song_0001.imf"),("song_0002.imf","extr_song_0002.imf"),("song_0003.imf","extr_song_0003.imf")]
-- where song_0001.imf etc... are compressed chunks extracted from "AUDIO.CK5" using the offsets in "AUDIOHED.CK5".

Huffman Compression

Contents

Frequency tables

The Binary Tree

Constructing an optimal tree

Labeling the tree

Standard tree size

The dictionary

Huffman implementation in ID Software games

Trivial ID dictionary

Source code

QuickBasic

Visual Basic .NET

Huffman tree representation

Huffman tree reading

Haskell

Navigation menu

Huffman Compression

Frequency tables

The Binary Tree

Constructing an optimal tree

Labeling the tree

Standard tree size

The dictionary

Huffman implementation in ID Software games

Trivial ID dictionary

Source code

QuickBasic

Visual Basic .NET

Huffman tree representation

Huffman tree reading

Haskell

Navigation menu

Search