LZW Compression

From ModdingWiki
Jump to navigation Jump to search

LZW Compression is a common form of compression used in some early games to compress data and by most early games to compress their executables. It is notable in being one of the first compression methods to not compress on the byte level (Along with Huffman Compression) and for its efficiency.

The basic concept for LZW is universal, though the implementations differ. In essence it involves replacing data strings that have been encountered before with references to already decompressed data. (Known as a 'dictionary') This can be done in a number of ways, the two main approaches differing on whether the dictionary is separate or integrated

The algorithm is named after its creators; the full name is "Lempel–Ziv–Welch", referring to Abraham Lempel, Jacob Ziv, and Terry Welch.

Separate Dictionary Approach

In this approach the dictionary is separate from the data being decompressed, that is it is stored in a separate location in memory. In this it behaves more like one would expect a dictionary to work; when a codeword is found in the data, it is looked up in the dictionary and the corresponding string copied to output. (As an example dictionary entry 42 could represent the string 'life', thus whenever the code '42' is encountered the string 'life' is added to the decompressed data.)

The advantage of this method is that the efficiency of compression increases as the amount of data to compress increases. The following points differ between implementations:

  • The initial dictionary. Just how large the initial dictionary is varies. Some implementations start with no dictioanry at all, others set a number of entries, usually 255, covering all possible 1-byte values.
  • The maximum size of the dictionary. Many older implementations with less resources were forced to cap the dictionary at a certain size, usually a power of two entries long. (512, 1024...) Unlimited implementations are rare as modern methods (e.g. the DEFLATE algorithm.) usually rely on several compression methods at once. Sometimes the dictionary is 'reset' when it reaches too large a size.
  • Whether the codestream is made partly or entirely of codewords. Often the compressed data is made entirely of codewords, even non-repeating strings, which means that initially compression can sometimes be rather poor. Other implementations use codeowords only for repeating strings. Differences in how codewords vs literal are indicated and how dictionaries are built up may occur.

Decoding

This is a general decoding algorithm for separate dictionary LZW. It will need to be altered slightly when dealing with different implementations. Notably it assumes that the codestream is composed entirely of codewords and that the dictionary can keep growing indefinitely.

   Add all roots to the dictionary. Code 0 corresponds to $00, code 1 is $01, etc to $FF;
   Add error, clear, and end-mark flags to the dictionary as appropriate;
   FirstCode [as unknown length binary number] = the first code in the codestream;
   CurMatch [as byte array] = the dictionary entry for FirstCode;
   Output CurMatch;
   Loop until end of codestream {
       CurCode [as unknown length binary number] = next code in the codestream;
       TempMatch [as byte array] allocate;
       If there is an entry for CurCode in the dictionary {
           TempMatch = the dictionary entry for CurCode;
       } If not {
           TempMatch = CurMatch;
           Concatenate the first byte of CurMatch to the end of TempMatch;
       }
       Output TempMatch;
       NewDictEntry [as byte array] = CurMatch;
       Concatenate the first byte of TempMatch to the end of NewDictEntry;
       Try to add NewDictEntry to the dictionary for the first empty key;
       CurMatch = TempMatch;
   }

Commander Keen 1-3

In Commander Keen 1-3 LZW is used to compress the EGALATCH and EGASPRIT files in episode 1 (It can also be used in episodes 2 and 3, but isn't.) The game uses two error-checking methods in this implementation, firstly it reserves two dictionary values, $100 to indicate an error (This is written by the compression program and will make the executable abort.) and $101 to indicate the end of data. (If the program reaches the end of the data without encountering this it will also abort.) The compressed data is also prefixed with a dword giving the decompressed data size, so this can be compared with the output.

This method is a typical separate dictionary approach. It starts with a dictionary of 256 9-bit codewords representing the 8-bit strings $00-$FF (Plus some special cases.) The dictionary is allowed to grow to 4096 entries. The following is the initial dictionary:

       0000 - 00 (character)
       0001 - 01 (character)
           ...
       00FE - FE (character)
       00FF - FF (character)
       0100 - Reserved for errors...
       0101 - Reserved for end of compressed data...
       0102 - (not set)
       0103 - (not set)

It will be immediately noticed that 4096 entries cannot be represented by 9-bit codes but at the least by 12-bit codes. To further conserve space the length of the codewords is increased every time the dictionary grows too large. Thus when it reaches $01FF entries codewords become 10 bits long, at $03FF they are 11 bits and finally at $07FF 12 bits. At $0FFF entries the dictionary stops growing.

The following data is taken from the EGALATCH file from Keen 1. Notice that the first six bytes are ignored. (The first four give the decompressed data size, the next two are the maximum number of bits the LZW decoder will use.) The first few steps of decompression follows.

   0000  80 D3 01 00  0C 00 00 40 - A0 70 31 E9  F8 F8 78 38
   0010  08 08 00 07  FC 39 FF 04 - 5E 41 E1 30  B3 C4 5A 2F

The first code word encountered is 000000000 (First 9 bits) and thus outputs the string $00 - the first dictionary entry. This has set us up to step 4 and now things work slightly differently.

The second code word is the next 9 bits, 100000010, which would point to entry $102. Since this entry is NOT found in the dictionary yet, we will create this entry then output it. Entry $102 is created by taking the previous codeword's string adding to it the first byte of that string. In this case the previous code word's (0) string is $00. $00 + $00 is $00 $00. Entry $102 thus represents the string $00 $00

The next code word is 100000011, which is entry $103, which again doesn't exist. Entry $103 is created just like with $102, except now since the previous codeword is $00 $00, entry $103 is $00 $00 $00.

The next code word is again $103, this IS found in the dictionary and is outputted ($00 $00 $00) however we now create dictionary entry $104 just like $103. (It is $00 $00 $00 $00.) Note that the previous codeword is still $103.

The next code word is $3B. It is outputted and entry $105 created ($00 $00 $00 $3B) Now, the previous codeword is $3B. This pattern continues.

Integrated Dictionary Approach

In this approach the dictionary is the decompressed data itself. The codewords do not represent an entry in the dictionary structure, but rather directions in the decompressed code as to where a repeated string is located. That is a repeat string may well be represented by a codeword that states 'copy seven bytes from byte 132 in the decompressed data'

The advantage of this approach is that it doesn't require a separate construct for the dictionary and can just use the already decompressed data. The downside is that it will always need some method to distinguish literals and codewords, and, since codewords are nearly always of a fixed length there is an inherent limit both to how long the copied string can be and where it can be read from. this means that eventually the compression efficiency will level off and stop improving.

There This is often called the 'sliding window' and it represents the data that can be 'reached' by the codewords. It is named as such because it is of a fixed length and 'slides' along the output stream as it gets longer. The following features differ between implementations:

  • Differentiating codewords from literals. There must be a way to tell codewords apart from data that is just to be read and outputted. Sometimes this is integrated into the codewords themselves, but more often the 'flag' precedes a codeword. The flag may indicate only codewords or both codewords and literals. ('Following data is made of two codewords and six literals', etc.)
  • Codeword format. Most implementations use codewords of a fixed format that must encode both the length of data to copy and the location to copy it from. Codewords are usually two or four bytes long.
  • Zero point location. Different implementations use different places in the output as zero. If the start of the code is used as zero only the first x bytes of the output can be used as a reference. Most implementations use a more complex, but seldom less effective 'sliding window'; the zero point is the start of the data until the data becomes too long at which point it moves forward so that the most recent x bytes of output can be used. It is also possible for zero to be the most recent byte with all locations being 'x bytes from the end', which also produces a sliding window. Finally it may be possible to have both negative and positive locations.
  • Sliding window. The nature of this is dictated by the format of the codewords. Common sizes are 1KB, 2KB or 4KB. The window will always be present, but it may not 'slide' if the implementation uses a fixed location as the zero point.

LZEXE

Many vintage executables are compressed with the program LZEXE, interesting in that the compressed file contains its own decompressor, meaning that it is in essence a self-extracting archive. However the unique feature of LZEXE executables is that they extract the compressed data to memory and run it. To the user this is indistinguishable from the decompressed executable, though it takes slightly longer to start up and takes up much less space.

UNLZEXE can be used to extract the decompressed executables from this which will run perfectly with other game files. It can be obtained here: http://www.dosclassics.com/download/198 It is currently unknown specifically how the LZW compression is implemented in this case, but with the source code for decompression is available.

The LZEXE compression is similar to the SoftDisk Library Approach described below, but it uses UINT16LE values instead of byte values to store the flag bits and the sliding window has a size of 8192 (0x2000) bytes. Also, the flag bits have different meanings (you could argue that they are in fact Huffman codes):

   1 -> copy 1 literal byte
  10 -> next two bytes contain length and distance
0000 -> length is 2, next byte contains distance 
1000 -> length is 3, next byte contains distance 
0100 -> length is 4, next byte contains distance 
1100 -> length is 5, next byte contains distance 

The real distance value is always a signed 16 bit integer (you can use unsigned but then you have to bitwise-and the resulting index value with 0xFFFF). If the length is given by the flag bits/Huffman code, the real distance is b | 0xFF00 or b - 256 where b is the byte value read from the file. If the length value is not given, read the byte values b0 and b1 and calculate length and distance like this:

length = (b1 mod 8)+2
distance = b0 + (b1/8)*256 - 8192

or

length = (b1 & 0x07)+2
distance = b0 | ((b1 & 0xF8)<<5) | 0xE000

If the length value calculated from b1 is 2, this indicates that another byte value b2 must be read. Depending in the value of b2, one of three things can happen:

b2 = 0:
   end of compressed data - stop decompressing
b2 = 1:
   end of segment
   (decompressor may write contents of buffer to output)
   set length to 0 or jump to the part where the decompressor reads the next flag bits/Huffman code
otherwise:
   set length to b2+1

Now the decompressor must only add distance to the current buffer index (since distance is negative, the index goes backwards) and copy length bytes from there to the current index:

WHILE length > 0
   buffer[index] = buffer[index+distance]
   index = index + 1
   length = length - 1
END WHILE

Please refer to the UNLZEXE source code for further information.

SoftDisk Library Approach

This is used as the first form of compression in the Softdisk Library Format. Flags are 1-byte long and divide the datastream into segments of eight 'values' which can be either literals or codewords. Codewords are 2 bytes long, literals 1 byte. (Therefore there will be a flag byte every 8 to 16 bytes of data.) The value of each bit (In little endian) indicates whether a value will be a literal (1) or codeword (0) Thus a value of 199 (11000111 in binary) indicates three codewords, three literals and two codewords in that order. (Total of 13 bytes.)

Literals are sequences that have never been seen in the datastream before, they cannot be compressed and are thus the same in the compressed and decompressed datastreams. (If the data is text they become quite obvious.) Any string less than 3 bytes long that has not been read before or cannot be pointed to (See below) will be stored as literals.

Codewords are reference to data that has already been read. They are two bytes long, with the first 12 bits giving the location to read data from and the last 4 bits giving the length of data to read.

The lower nybble (4 bits) of the second codeword byte holds the length of repeat data to read minus three. (This makes sense, the shortest sequence it makes sense to code is three bytes which can be given the value 0.) It will be immediately apparent that the maximum length of repeated data that can be stored as a codeword is 18 bytes.

The high nybble of the second byte is multiplied by 16 then added to the first byte to give the location of the data to read in the 'sliding window' minus 19. (This is due to the way the decompression is set up in memory.)

It will be immediately obvious that the codewords can encode values between +-2048, or about 2KB. If the decompressed data is less than 2KB in size then zero is the start of the data, if it is larger than it is 2048 bytes from the data end.

It will be noted that it is probable that the compressed datastream will not be perfectly divisible by flag bytes. In this case the unused bits are set to 0. The decompressor stops when the decompressed data size is equal to the value given in the chunk header. (If it runs out of data it will abort.)

As a simple example the sentence 'I am Sam. Sam I am!' will be compressed to:

FF				Flag byte, 8 literals follow
49 20 61 6D 20 53 61 6D	'I am Sam' as literals
2B				Flag byte, 2L, P, L, P, L 2Blanks ($2B = 43 = 00101011)
2E 20				' .' as literals
F2 F0				codeword, read 0 + 3 = 3 bytes from $FF2, or -14 + 19 = 5 in the data. This is 'Sam'
20				' ' as literal
ED F1				codeword, read 1 + 3 = 4 bytes from $FED or -19 + 19 = 1 in the data. This is 'I am'
21				'!' as literal

Other LZW article pages

These pages detail specific LZW algorithms used by certain games:

Source code

Some example code is available in various languages showing how to decompress (and in some cases compress) files using the Keen's LZW algorithm in its various implementations.

Keen 1-3 Implementation

These segments of code work with the Keen 1-3 implementation only and will not for example decompress LZEXE compressed executables.

QuickBasic

DECLARE FUNCTION READBITS% (FILE AS INTEGER, NUMBITS AS INTEGER)
DECLARE SUB LZWDECOMPRESS (INNAME AS STRING, OUTNAME AS STRING)
DECLARE SUB LZWOUTPUT (FILE AS INTEGER, DIC AS INTEGER, CHAR AS INTEGER)
'
' KEEN1 Compatible LZW Decompressor (Lempel-Ziv-Welch)
' - by Napalm with thanks to Adurdin's work on ModKeen
'
' This source is Public Domain
'
'

' Allocate dictionary
DIM LZDIC(0 TO 4095) AS INTEGER
DIM LZCHR(0 TO 4095) AS INTEGER

' Test Function
LZWDECOMPRESS "EGALATCH.CK1", "EGALATCH.DAT"


SUB LZWDECOMPRESS (INNAME AS STRING, OUTNAME AS STRING)
        SHARED LZDIC() AS INTEGER, LZCHR() AS INTEGER
        DIM INFILE AS INTEGER, OUTFILE AS INTEGER, I AS INTEGER
        DIM BITLEN AS INTEGER, CURPOS AS INTEGER
        DIM CW AS INTEGER, PW AS INTEGER, C AS INTEGER, P AS INTEGER
        DIM CHECK AS INTEGER

        ' Open files for input and output
        INFILE = FREEFILE
        OPEN INNAME FOR BINARY ACCESS READ AS INFILE
        OUTFILE = FREEFILE
        OPEN OUTNAME FOR BINARY ACCESS WRITE AS OUTFILE
        SEEK INFILE, 7

       
        ' Fill dictionary with starting values
        FOR I = 0 TO 4095
                LZDIC(I) = -1
                IF I < 256 THEN
                        LZCHR(I) = I
                ELSE
                        LZCHR(I) = -1
                END IF
        NEXT I

        ' Decompress input stream to output stream
        BITLEN = 9
        CURPOS = 258
        CW = READBITS(INFILE, BITLEN)
        LZWOUTPUT OUTFILE, LZDIC(CW), LZCHR(CW)
       
        WHILE CW <> &H100 AND CW <> &H101
                PW = CW
                CW = READBITS(INFILE, BITLEN)
                IF CW <> &H100 AND CW <> &H101 THEN
                        P = PW
                        CHECK = (LZCHR(CW) <> -1)
                       
                        IF CHECK THEN
                                TMP = CW
                        ELSE
                                TMP = PW
                        END IF
                        WHILE LZDIC(TMP) <> -1
                                TMP = LZDIC(TMP)
                        WEND
                        C = LZCHR(TMP)
                       
                        IF CHECK THEN
                                LZWOUTPUT OUTFILE, LZDIC(CW), LZCHR(CW)
                        ELSE
                                LZWOUTPUT OUTFILE, P, C
                        END IF
                       
                        IF CURPOS < 4096 THEN
                                LZDIC(CURPOS) = P
                                LZCHR(CURPOS) = C
                                CURPOS = CURPOS + 1
                                IF CURPOS = (2 ^ BITLEN - 1) AND BITLEN < 12 THEN
                                        BITLEN = BITLEN + 1
                                END IF
                        END IF

                END IF
        WEND
       
        ' Close files
        CLOSE OUTFILE
        CLOSE INFILE
END SUB

SUB LZWOUTPUT (FILE AS INTEGER, DIC AS INTEGER, CHAR AS INTEGER)
        SHARED LZDIC() AS INTEGER, LZCHR() AS INTEGER
        DIM LZSTK(0 TO 127) AS STRING * 1
        DIM X AS INTEGER, SP AS INTEGER
        DIM LDIC AS INTEGER, LCHAR AS INTEGER

        LCHAR = CHAR
        LDIC = DIC
        SP = 0
        X = 1

        DO
                IF SP >= 128 THEN
                        PRINT "LZW: Stack Overflow!"
                        END
                END IF
                LZSTK(SP) = CHR$(LCHAR)
                SP = SP + 1
                IF LDIC <> -1 THEN
                        LCHAR = LZCHR(LDIC)
                        LDIC = LZDIC(LDIC)
                ELSE
                        X = 0
                END IF
        LOOP WHILE X
       
        WHILE SP <> 0
                SP = SP - 1
                PUT FILE, , LZSTK(SP)
        WEND
END SUB

FUNCTION READBITS% (FILE AS INTEGER, NUMBITS AS INTEGER)
        STATIC BITDAT AS STRING * 1, BITPOS AS INTEGER
        DIM BITVAL AS INTEGER, BIT AS INTEGER

        BITVAL = 0
        FOR BIT = (NUMBITS - 1) TO 0 STEP -1
                IF BITPOS = 0 THEN
                        GET FILE, , BITDAT
                        BITPOS = 7
                ELSE
                        BITPOS = BITPOS - 1
                END IF
                IF ASC(BITDAT) AND 2 ^ BITPOS THEN
                        BITVAL = BITVAL OR 2 ^ BIT
                END IF
        NEXT BIT

        READBITS% = BITVAL
END FUNCTION

FreeBasic

This code does not suffer from the 64K memory limit imposed by QuickBasic and so is less efficient, but runs faster. It can be compiled with FreeBasic compiler using the -lang=qb switch. Aside from memory concerns, all code here is compatible with QuickBasic.

The code before the subroutine is used to make a string containing the bit expansion of all values from 0 to 255. The subroutine takes a filename, reads the entire file into memory then expands each bit of data to a byte using the aforesaid string as Basic cannot deal with bits directly. cw$ is codeword, pw$ is the previous codeword, lun is the lowest dictionary entry that is empty, p is the location in the compressed data stream and bl is the length of codes in bits (Starting at nine bits increasing to 12)

The dictionary is set before decompression. The first 258 are the starting dictionary, the remainder are cleared. (It is vital to reset the dictionary for each file) An error occurs if entry 256 is found in the data, 'distrupt' is printed when the newest dictionary entry is not the lowest possible entry (This shouldn't happen but is possible.) Decompression ends at encountering entry 257, or when there is no more data to read.

DECLARE SUB LZWDEC (lfn AS STRING)

x$ = ""
FOR l = 0 TO 255
 IF (l AND 128) > 0 THEN x$ = x$ + "1" ELSE x$ = x$ + "0"
 IF (l AND 64) > 0 THEN x$ = x$ + "1" ELSE x$ = x$ + "0"
 IF (l AND 32) > 0 THEN x$ = x$ + "1" ELSE x$ = x$ + "0"
 IF (l AND 16) > 0 THEN x$ = x$ + "1" ELSE x$ = x$ + "0"
 IF (l AND 8) > 0 THEN x$ = x$ + "1" ELSE x$ = x$ + "0"
 IF (l AND 4) > 0 THEN x$ = x$ + "1" ELSE x$ = x$ + "0"
 IF (l AND 2) > 0 THEN x$ = x$ + "1" ELSE x$ = x$ + "0"
 IF (l AND 1) > 0 THEN x$ = x$ + "1" ELSE x$ = x$ + "0"
NEXT l
LZEDEC "EGALATCH.CK1"
END


'_________________________________________________
SUB LZWDEC (lfn AS STRING) ' Decompress LZW data
'_________________________________________________
DIM lzw(0 TO 4095) AS STRING
PRINT lfn; " is LZW compressed, decompressing...";
OPEN folder + lfn FOR BINARY AS #9
y$ = SPACE$(LOF(9))
GET #9, 1, y$
CLOSE #9
z$ = ""
FOR l = 7 TO LEN(y$)
 z$ = z$ + MID$(x$, (ASC(MID$(y$, l, 1)) * 8) + 1, 8)
NEXT l

bl = 9
lun = 258
p = 1
cw$ = ""
y$ = ""
FOR l = 0 TO 4095
 IF l < 256 THEN lzw(l) = CHR$(l) ELSE lzw(l) = ""
NEXT l
DO
 IF lun = 511 THEN bl = 10
 IF lun = 1023 THEN bl = 11
 IF lun = 2047 THEN bl = 12
 pw$ = cw$
 u$ = MID$(z$, p, bl)
 p = p + bl
 y = 0
 FOR l = 1 TO bl
  IF MID$(u$, bl - l + 1, 1) = "1" THEN y = y + (2 ^ (l - 1))
 NEXT l
 IF y = 256 THEN
  PRINT "LZW error in Keen data!"
  OPEN "ERROR.DAT" FOR OUTPUT AS #9
  PRINT #9, y$;
  CLOSE
  END
 END IF
 IF y = 257 THEN EXIT DO
 IF cw$ = "" THEN
  cw$ = lzw(y)
  y$ = y$ + cw$
 ELSE
  IF lun < 4096 THEN
   IF lzw(y) = "" THEN
    cw$ = pw$ + LEFT$(pw$, 1)
    lzw(y) = cw$
    y$ = y$ + cw$
    IF y <> lun THEN PRINT "Disrupt!"
    lun = y + 1
   ELSE
    cw$ = lzw(y)
    y$ = y$ + cw$
    lzw(lun) = pw$ + LEFT$(cw$, 1)
    lun = lun + 1
   END IF
  ELSE
   IF lzw(y) = "" THEN
    y$ = y$ + cw$
   ELSE
    cw$ = lzw(y)
    y$ = y$ + cw$
   END IF
  END IF
 END IF
LOOP WHILE p < LEN(z$)
IF y = 257 THEN PRINT "done" ELSE PRINT "out of data."
OPEN folder + LEFT$(lfn, 4) + extq FOR OUTPUT AS #9
PRINT #9, y$;
CLOSE #9
END SUB

Visual Basic .NET

This implementation uses high-level elements such as lambdas, anonymous arrays, and strict types. It must be compiled for the Microsoft .NET Framework v4.5 in Visual Studio 2012. It has the advantages of running in non-DosBox Windows and using .NET streams for simple reusability. It can also be used for higher-bit LZW.

Decompression

Decompressing is very fast.

    Sub DecompressLZW(Data As IO.Stream, MaxBits As Byte, Output As IO.Stream)
        ' This source is by Fleexy and is in the public domain. If used, please note it as such.
        Dim dict As New List(Of Byte())
        For x = 0 To 255
            dict.Add({x})
        Next
        dict.Add({})
        dict.Add(Nothing)
        Dim usebits As Byte = 9
        Dim bpos As Long
        Dim bits As New List(Of Byte)
        Do Until Data.Position = Data.Length
            Dim b, ub As Byte
            b = Data.ReadByte
            ub = b
            For x = 7 To 0 Step -1
                If ub - (2 ^ x) >= 0 Then
                    ub -= (2 ^ x)
                    bits.Add(1)
                Else
                    bits.Add(0)
                End If
            Next
        Loop
        Dim GetCode = Function() As UInteger
                          Dim u As UInteger
                          For x = usebits To 1 Step -1
                              If bits(bpos) = 1 Then u += (2 ^ (x - 1))
                              bpos += 1
                          Next
                          Return u
                      End Function
        Dim OutputCode = Sub(DecompData As Byte())
                             Dim n As UInteger = DecompData.Length
                             Output.Write(DecompData, 0, n)
                         End Sub
        Dim AddToDict = Sub(Entry As Byte())
                            If dict.Count < (2 ^ MaxBits) Then
                                dict.Add(Entry)
                                If dict.Count = (2 ^ usebits) - 1 Then usebits = Math.Min(usebits + 1, MaxBits)
                            End If
                        End Sub
        Dim fcode As UInteger = GetCode()
        Dim match As Byte() = dict(fcode)
        OutputCode(match)
        Do
            Dim ncode As UInteger = GetCode()
            If ncode = 257 Then Exit Do
            If ncode = 256 Then Throw New Exception
            Dim nmatch As Byte()
            If ncode < dict.Count Then
                nmatch = dict(ncode)
            Else
                nmatch = match.Concat({match(0)}).ToArray
            End If
            OutputCode(nmatch)
            AddToDict(match.Concat({nmatch(0)}).ToArray)
            match = nmatch
        Loop
    End Sub
Compression

Compression is more difficult; consulting the dictionary for a byte array takes more time. The speed of this algorithm may be unacceptable.

    Sub CompressLZW(Data As IO.Stream, MaxBits As Byte, Output As IO.Stream)
        ' This source is by Fleexy and is in the public domain.
        Dim bits As New List(Of Byte)
        Dim dict As New List(Of Byte())
        For x = 0 To 255
            dict.Add({x})
        Next
        dict.Add({})
        dict.Add({})
        Dim usebits As Byte = 9
        Dim PutCode = Sub(Code As UInteger)
                          For x = usebits To 1 Step -1
                              If Code - (2 ^ (x - 1)) >= 0 Then
                                  Code -= (2 ^ (x - 1))
                                  bits.Add(1)
                              Else
                                  bits.Add(0)
                              End If
                          Next
                      End Sub
        Dim AddToDict = Function(Entry As Byte()) As Boolean
                            If dict.Count < 2 ^ MaxBits Then
                                dict.Add(Entry)
                                If dict.Count = 2 ^ usebits Then usebits = Math.Min(usebits + 1, MaxBits)
                                Return True
                            Else
                                Return False
                            End If
                        End Function
        Dim FindCode = Function(Bytes As Byte()) As UInteger
                           For x = 1 To dict.Count
                               If dict(x - 1).Count = Bytes.Count Then
                                   If dict(x - 1).SequenceEqual(Bytes) Then Return x - 1
                               End If
                           Next
                           Throw New NotFiniteNumberException
                       End Function
        Dim DictContains = Function(Bytes As Byte()) As Boolean
                               For x = 1 To dict.Count
                                   If dict(x - 1).Length = Bytes.Length Then
                                       If dict(x - 1).SequenceEqual(Bytes) Then Return True
                                   End If
                               Next
                               Return False
                           End Function
        Dim match As Byte() = {}
        Do Until Data.Position = Data.Length
            Dim nbyte As Byte = Data.ReadByte
            Dim nmatch As Byte() = match.Concat({nbyte}).ToArray
            If DictContains(nmatch) Then
                match = nmatch
            Else
                PutCode(FindCode(match))
                AddToDict(nmatch)
                match = {nbyte}
            End If
        Loop
        PutCode(FindCode(match))
        PutCode(257)
        Do Until bits.LongCount Mod 8L = 0L
            bits.Add(0)
        Loop
        For x = 1 To CInt(bits.LongCount / 8L)
            Dim b As Byte = 0
            For y = 0 To 7
                b += bits((x - 1) * 8 + y) * (2 ^ (7 - y))
            Next
            Output.WriteByte(b)
        Next
    End Sub