LZW Compression
Format type | Compression algorithm |
---|---|
Type | Stream |
I/O unit size | 1-bit |
Games |
LZW Compression is a common form of compression used in some early games to compress data and by most early games to compress their executables. It is notable in being one of the first compression methods to not compress on the byte level (Along with Huffman Compression) and for its efficiency.
The basic concept for LZW is universal, though the implementations differ. In essence it involves replacing data strings that have been encountered before with references to already decompressed data. (Known as a 'dictionary') This can be done in a number of ways, the two main approaches differing on whether the dictionary is separate or integrated
The algorithm is named after its creators; the full name is "Lempel–Ziv–Welch", referring to Abraham Lempel, Jacob Ziv, and Terry Welch.
Separate Dictionary Approach
In this approach the dictionary is separate from the data being decompressed, that is it is stored in a separate location in memory. In this it behaves more like one would expect a dictionary to work; when a codeword is found in the data, it is looked up in the dictionary and the corresponding string copied to output. (As an example dictionary entry 42 could represent the string 'life', thus whenever the code '42' is encountered the string 'life' is added to the decompressed data.)
The advantage of this method is that the efficiency of compression increases as the amount of data to compress increases. The following points differ between implementations:
- The initial dictionary. Just how large the initial dictionary is varies. Some implementations start with no dictioanry at all, others set a number of entries, usually 255, covering all possible 1-byte values.
- The maximum size of the dictionary. Many older implementations with less resources were forced to cap the dictionary at a certain size, usually a power of two entries long. (512, 1024...) Unlimited implementations are rare as modern methods (e.g. the DEFLATE algorithm.) usually rely on several compression methods at once. Sometimes the dictionary is 'reset' when it reaches too large a size.
- Whether the codestream is made partly or entirely of codewords. Often the compressed data is made entirely of codewords, even non-repeating strings, which means that initially compression can sometimes be rather poor. Other implementations use codeowords only for repeating strings. Differences in how codewords vs literal are indicated and how dictionaries are built up may occur.
Decoding
This is a general decoding algorithm for separate dictionary LZW. It will need to be altered slightly when dealing with different implementations. Notably it assumes that the codestream is composed entirely of codewords and that the dictionary can keep growing indefinitely.
Add all roots to the dictionary. Code 0 corresponds to $00, code 1 is $01, etc to $FF; Add error, clear, and end-mark flags to the dictionary as appropriate; FirstCode [as unknown length binary number] = the first code in the codestream; CurMatch [as byte array] = the dictionary entry for FirstCode; Output CurMatch; Loop until end of codestream { CurCode [as unknown length binary number] = next code in the codestream; TempMatch [as byte array] allocate; If there is an entry for CurCode in the dictionary { TempMatch = the dictionary entry for CurCode; } If not { TempMatch = CurMatch; Concatenate the first byte of CurMatch to the end of TempMatch; } Output TempMatch; NewDictEntry [as byte array] = CurMatch; Concatenate the first byte of TempMatch to the end of NewDictEntry; Try to add NewDictEntry to the dictionary for the first empty key; CurMatch = TempMatch; }
Commander Keen 1-3
In Commander Keen 1-3 LZW is used to compress the EGALATCH and EGASPRIT files in episode 1 (It can also be used in episodes 2 and 3, but isn't.) The game uses two error-checking methods in this implementation, firstly it reserves two dictionary values, $100 to indicate an error (This is written by the compression program and will make the executable abort.) and $101 to indicate the end of data. (If the program reaches the end of the data without encountering this it will also abort.) The compressed data is also prefixed with a dword giving the decompressed data size, so this can be compared with the output.
This method is a typical separate dictionary approach. It starts with a dictionary of 256 9-bit codewords representing the 8-bit strings $00-$FF (Plus some special cases.) The dictionary is allowed to grow to 4096 entries. The following is the initial dictionary:
0000 - 00 (character) 0001 - 01 (character) ... 00FE - FE (character) 00FF - FF (character) 0100 - Reserved for errors... 0101 - Reserved for end of compressed data... 0102 - (not set) 0103 - (not set)
It will be immediately noticed that 4096 entries cannot be represented by 9-bit codes but at the least by 12-bit codes. To further conserve space the length of the codewords is increased every time the dictionary grows too large. Thus when it reaches $01FF entries codewords become 10 bits long, at $03FF they are 11 bits and finally at $07FF 12 bits. At $0FFF entries the dictionary stops growing.
The following data is taken from the EGALATCH file from Keen 1. Notice that the first six bytes are ignored. (The first four give the decompressed data size, the next two are the maximum number of bits the LZW decoder will use.) The first few steps of decompression follows.
0000 80 D3 01 00 0C 00 00 40 - A0 70 31 E9 F8 F8 78 38 0010 08 08 00 07 FC 39 FF 04 - 5E 41 E1 30 B3 C4 5A 2F
The first code word encountered is 000000000 (First 9 bits) and thus outputs the string $00 - the first dictionary entry. This has set us up to step 4 and now things work slightly differently.
The second code word is the next 9 bits, 100000010, which would point to entry $102. Since this entry is NOT found in the dictionary yet, we will create this entry then output it. Entry $102 is created by taking the previous codeword's string adding to it the first byte of that string. In this case the previous code word's (0) string is $00. $00 + $00 is $00 $00. Entry $102 thus represents the string $00 $00
The next code word is 100000011, which is entry $103, which again doesn't exist. Entry $103 is created just like with $102, except now since the previous codeword is $00 $00, entry $103 is $00 $00 $00.
The next code word is again $103, this IS found in the dictionary and is outputted ($00 $00 $00) however we now create dictionary entry $104 just like $103. (It is $00 $00 $00 $00.) Note that the previous codeword is still $103.
The next code word is $3B. It is outputted and entry $105 created ($00 $00 $00 $3B) Now, the previous codeword is $3B. This pattern continues.
Integrated Dictionary Approach
In this approach the dictionary is the decompressed data itself. The codewords do not represent an entry in the dictionary structure, but rather directions in the decompressed code as to where a repeated string is located. That is a repeat string may well be represented by a codeword that states 'copy seven bytes from byte 132 in the decompressed data'
The advantage of this approach is that it doesn't require a separate construct for the dictionary and can just use the already decompressed data. The downside is that it will always need some method to distinguish literals and codewords, and, since codewords are nearly always of a fixed length there is an inherent limit both to how long the copied string can be and where it can be read from. this means that eventually the compression efficiency will level off and stop improving.
There This is often called the 'sliding window' and it represents the data that can be 'reached' by the codewords. It is named as such because it is of a fixed length and 'slides' along the output stream as it gets longer. The following features differ between implementations:
- Differentiating codewords from literals. There must be a way to tell codewords apart from data that is just to be read and outputted. Sometimes this is integrated into the codewords themselves, but more often the 'flag' precedes a codeword. The flag may indicate only codewords or both codewords and literals. ('Following data is made of two codewords and six literals', etc.)
- Codeword format. Most implementations use codewords of a fixed format that must encode both the length of data to copy and the location to copy it from. Codewords are usually two or four bytes long.
- Zero point location. Different implementations use different places in the output as zero. If the start of the code is used as zero only the first x bytes of the output can be used as a reference. Most implementations use a more complex, but seldom less effective 'sliding window'; the zero point is the start of the data until the data becomes too long at which point it moves forward so that the most recent x bytes of output can be used. It is also possible for zero to be the most recent byte with all locations being 'x bytes from the end', which also produces a sliding window. Finally it may be possible to have both negative and positive locations.
- Sliding window. The nature of this is dictated by the format of the codewords. Common sizes are 1KB, 2KB or 4KB. The window will always be present, but it may not 'slide' if the implementation uses a fixed location as the zero point.
LZEXE
Many vintage executables are compressed with the program LZEXE, interesting in that the compressed file contains its own decompressor, meaning that it is in essence a self-extracting archive. However the unique feature of LZEXE executables is that they extract the compressed data to memory and run it. To the user this is indistinguishable from the decompressed executable, though it takes slightly longer to start up and takes up much less space.
UNLZEXE can be used to extract the decompressed executables from this which will run perfectly with other game files. It can be obtained here: http://www.dosclassics.com/download/198 It is currently unknown specifically how the LZW compression is implemented in this case, but with the source code for decompression is available.
The LZEXE compression is similar to the SoftDisk Library Approach described below, but it uses UINT16LE values instead of byte values to store the flag bits and the sliding window has a size of 8192 (0x2000) bytes. Also, the flag bits have different meanings (you could argue that they are in fact Huffman codes):
1 -> copy 1 literal byte 10 -> next two bytes contain length and distance 0000 -> length is 2, next byte contains distance 1000 -> length is 3, next byte contains distance 0100 -> length is 4, next byte contains distance 1100 -> length is 5, next byte contains distance
The real distance value is always a signed 16 bit integer (you can use unsigned but then you have to bitwise-and the resulting index value with 0xFFFF). If the length is given by the flag bits/Huffman code, the real distance is b | 0xFF00 or b - 256 where b is the byte value read from the file. If the length value is not given, read the byte values b0 and b1 and calculate length and distance like this:
length = (b1 mod 8)+2 distance = b0 + (b1/8)*256 - 8192
or
length = (b1 & 0x07)+2 distance = b0 | ((b1 & 0xF8)<<5) | 0xE000
If the length value calculated from b1 is 2, this indicates that another byte value b2 must be read. Depending in the value of b2, one of three things can happen:
b2 = 0: end of compressed data - stop decompressing b2 = 1: end of segment (decompressor may write contents of buffer to output) set length to 0 or jump to the part where the decompressor reads the next flag bits/Huffman code otherwise: set length to b2+1
Now the decompressor must only add distance to the current buffer index (since distance is negative, the index goes backwards) and copy length bytes from there to the current index:
WHILE length > 0 buffer[index] = buffer[index+distance] index = index + 1 length = length - 1 END WHILE
Please refer to the UNLZEXE source code for further information.
SoftDisk Library Approach
This is used as the first form of compression in the Softdisk Library Format. Flags are 1-byte long and divide the datastream into segments of eight 'values' which can be either literals or codewords. Codewords are 2 bytes long, literals 1 byte. (Therefore there will be a flag byte every 8 to 16 bytes of data.) The value of each bit (In little endian) indicates whether a value will be a literal (1) or codeword (0) Thus a value of 199 (11000111 in binary) indicates three codewords, three literals and two codewords in that order. (Total of 13 bytes.)
Literals are sequences that have never been seen in the datastream before, they cannot be compressed and are thus the same in the compressed and decompressed datastreams. (If the data is text they become quite obvious.) Any string less than 3 bytes long that has not been read before or cannot be pointed to (See below) will be stored as literals.
Codewords are reference to data that has already been read. They are two bytes long, with the first 12 bits giving the location to read data from and the last 4 bits giving the length of data to read.
The lower nybble (4 bits) of the second codeword byte holds the length of repeat data to read minus three. (This makes sense, the shortest sequence it makes sense to code is three bytes which can be given the value 0.) It will be immediately apparent that the maximum length of repeated data that can be stored as a codeword is 18 bytes.
The high nybble of the second byte is multiplied by 16 then added to the first byte to give the location of the data to read in the 'sliding window' minus 19. (This is due to the way the decompression is set up in memory.)
It will be immediately obvious that the codewords can encode values between +-2048, or about 2KB. If the decompressed data is less than 2KB in size then zero is the start of the data, if it is larger than it is 2048 bytes from the data end.
It will be noted that it is probable that the compressed datastream will not be perfectly divisible by flag bytes. In this case the unused bits are set to 0. The decompressor stops when the decompressed data size is equal to the value given in the chunk header. (If it runs out of data it will abort.)
As a simple example the sentence 'I am Sam. Sam I am!' will be compressed to:
FF Flag byte, 8 literals follow 49 20 61 6D 20 53 61 6D 'I am Sam' as literals 2B Flag byte, 2L, P, L, P, L 2Blanks ($2B = 43 = 00101011) 2E 20 ' .' as literals F2 F0 codeword, read 0 + 3 = 3 bytes from $FF2, or -14 + 19 = 5 in the data. This is 'Sam' 20 ' ' as literal ED F1 codeword, read 1 + 3 = 4 bytes from $FED or -19 + 19 = 1 in the data. This is 'I am' 21 '!' as literal
Other LZW article pages
These pages detail specific LZW algorithms used by certain games:
Source code
Some example code is available in various languages showing how to decompress (and in some cases compress) files using the Keen's LZW algorithm in its various implementations.
Keen 1-3 Implementation
These segments of code work with the Keen 1-3 implementation only and will not for example decompress LZEXE compressed executables.
QuickBasic
DECLARE FUNCTION READBITS% (FILE AS INTEGER, NUMBITS AS INTEGER)
DECLARE SUB LZWDECOMPRESS (INNAME AS STRING, OUTNAME AS STRING)
DECLARE SUB LZWOUTPUT (FILE AS INTEGER, DIC AS INTEGER, CHAR AS INTEGER)
'
' KEEN1 Compatible LZW Decompressor (Lempel-Ziv-Welch)
' - by Napalm with thanks to Adurdin's work on ModKeen
'
' This source is Public Domain
'
'
' Allocate dictionary
DIM LZDIC(0 TO 4095) AS INTEGER
DIM LZCHR(0 TO 4095) AS INTEGER
' Test Function
LZWDECOMPRESS "EGALATCH.CK1", "EGALATCH.DAT"
SUB LZWDECOMPRESS (INNAME AS STRING, OUTNAME AS STRING)
SHARED LZDIC() AS INTEGER, LZCHR() AS INTEGER
DIM INFILE AS INTEGER, OUTFILE AS INTEGER, I AS INTEGER
DIM BITLEN AS INTEGER, CURPOS AS INTEGER
DIM CW AS INTEGER, PW AS INTEGER, C AS INTEGER, P AS INTEGER
DIM CHECK AS INTEGER
' Open files for input and output
INFILE = FREEFILE
OPEN INNAME FOR BINARY ACCESS READ AS INFILE
OUTFILE = FREEFILE
OPEN OUTNAME FOR BINARY ACCESS WRITE AS OUTFILE
SEEK INFILE, 7
' Fill dictionary with starting values
FOR I = 0 TO 4095
LZDIC(I) = -1
IF I < 256 THEN
LZCHR(I) = I
ELSE
LZCHR(I) = -1
END IF
NEXT I
' Decompress input stream to output stream
BITLEN = 9
CURPOS = 258
CW = READBITS(INFILE, BITLEN)
LZWOUTPUT OUTFILE, LZDIC(CW), LZCHR(CW)
WHILE CW <> &H100 AND CW <> &H101
PW = CW
CW = READBITS(INFILE, BITLEN)
IF CW <> &H100 AND CW <> &H101 THEN
P = PW
CHECK = (LZCHR(CW) <> -1)
IF CHECK THEN
TMP = CW
ELSE
TMP = PW
END IF
WHILE LZDIC(TMP) <> -1
TMP = LZDIC(TMP)
WEND
C = LZCHR(TMP)
IF CHECK THEN
LZWOUTPUT OUTFILE, LZDIC(CW), LZCHR(CW)
ELSE
LZWOUTPUT OUTFILE, P, C
END IF
IF CURPOS < 4096 THEN
LZDIC(CURPOS) = P
LZCHR(CURPOS) = C
CURPOS = CURPOS + 1
IF CURPOS = (2 ^ BITLEN - 1) AND BITLEN < 12 THEN
BITLEN = BITLEN + 1
END IF
END IF
END IF
WEND
' Close files
CLOSE OUTFILE
CLOSE INFILE
END SUB
SUB LZWOUTPUT (FILE AS INTEGER, DIC AS INTEGER, CHAR AS INTEGER)
SHARED LZDIC() AS INTEGER, LZCHR() AS INTEGER
DIM LZSTK(0 TO 127) AS STRING * 1
DIM X AS INTEGER, SP AS INTEGER
DIM LDIC AS INTEGER, LCHAR AS INTEGER
LCHAR = CHAR
LDIC = DIC
SP = 0
X = 1
DO
IF SP >= 128 THEN
PRINT "LZW: Stack Overflow!"
END
END IF
LZSTK(SP) = CHR$(LCHAR)
SP = SP + 1
IF LDIC <> -1 THEN
LCHAR = LZCHR(LDIC)
LDIC = LZDIC(LDIC)
ELSE
X = 0
END IF
LOOP WHILE X
WHILE SP <> 0
SP = SP - 1
PUT FILE, , LZSTK(SP)
WEND
END SUB
FUNCTION READBITS% (FILE AS INTEGER, NUMBITS AS INTEGER)
STATIC BITDAT AS STRING * 1, BITPOS AS INTEGER
DIM BITVAL AS INTEGER, BIT AS INTEGER
BITVAL = 0
FOR BIT = (NUMBITS - 1) TO 0 STEP -1
IF BITPOS = 0 THEN
GET FILE, , BITDAT
BITPOS = 7
ELSE
BITPOS = BITPOS - 1
END IF
IF ASC(BITDAT) AND 2 ^ BITPOS THEN
BITVAL = BITVAL OR 2 ^ BIT
END IF
NEXT BIT
READBITS% = BITVAL
END FUNCTION
FreeBasic
This code does not suffer from the 64K memory limit imposed by QuickBasic and so is less efficient, but runs faster. It can be compiled with FreeBasic compiler using the -lang=qb switch. Aside from memory concerns, all code here is compatible with QuickBasic.
The code before the subroutine is used to make a string containing the bit expansion of all values from 0 to 255. The subroutine takes a filename, reads the entire file into memory then expands each bit of data to a byte using the aforesaid string as Basic cannot deal with bits directly. cw$ is codeword, pw$ is the previous codeword, lun is the lowest dictionary entry that is empty, p is the location in the compressed data stream and bl is the length of codes in bits (Starting at nine bits increasing to 12)
The dictionary is set before decompression. The first 258 are the starting dictionary, the remainder are cleared. (It is vital to reset the dictionary for each file) An error occurs if entry 256 is found in the data, 'distrupt' is printed when the newest dictionary entry is not the lowest possible entry (This shouldn't happen but is possible.) Decompression ends at encountering entry 257, or when there is no more data to read.
DECLARE SUB LZWDEC (lfn AS STRING)
x$ = ""
FOR l = 0 TO 255
IF (l AND 128) > 0 THEN x$ = x$ + "1" ELSE x$ = x$ + "0"
IF (l AND 64) > 0 THEN x$ = x$ + "1" ELSE x$ = x$ + "0"
IF (l AND 32) > 0 THEN x$ = x$ + "1" ELSE x$ = x$ + "0"
IF (l AND 16) > 0 THEN x$ = x$ + "1" ELSE x$ = x$ + "0"
IF (l AND 8) > 0 THEN x$ = x$ + "1" ELSE x$ = x$ + "0"
IF (l AND 4) > 0 THEN x$ = x$ + "1" ELSE x$ = x$ + "0"
IF (l AND 2) > 0 THEN x$ = x$ + "1" ELSE x$ = x$ + "0"
IF (l AND 1) > 0 THEN x$ = x$ + "1" ELSE x$ = x$ + "0"
NEXT l
LZEDEC "EGALATCH.CK1"
END
'_________________________________________________
SUB LZWDEC (lfn AS STRING) ' Decompress LZW data
'_________________________________________________
DIM lzw(0 TO 4095) AS STRING
PRINT lfn; " is LZW compressed, decompressing...";
OPEN folder + lfn FOR BINARY AS #9
y$ = SPACE$(LOF(9))
GET #9, 1, y$
CLOSE #9
z$ = ""
FOR l = 7 TO LEN(y$)
z$ = z$ + MID$(x$, (ASC(MID$(y$, l, 1)) * 8) + 1, 8)
NEXT l
bl = 9
lun = 258
p = 1
cw$ = ""
y$ = ""
FOR l = 0 TO 4095
IF l < 256 THEN lzw(l) = CHR$(l) ELSE lzw(l) = ""
NEXT l
DO
IF lun = 511 THEN bl = 10
IF lun = 1023 THEN bl = 11
IF lun = 2047 THEN bl = 12
pw$ = cw$
u$ = MID$(z$, p, bl)
p = p + bl
y = 0
FOR l = 1 TO bl
IF MID$(u$, bl - l + 1, 1) = "1" THEN y = y + (2 ^ (l - 1))
NEXT l
IF y = 256 THEN
PRINT "LZW error in Keen data!"
OPEN "ERROR.DAT" FOR OUTPUT AS #9
PRINT #9, y$;
CLOSE
END
END IF
IF y = 257 THEN EXIT DO
IF cw$ = "" THEN
cw$ = lzw(y)
y$ = y$ + cw$
ELSE
IF lun < 4096 THEN
IF lzw(y) = "" THEN
cw$ = pw$ + LEFT$(pw$, 1)
lzw(y) = cw$
y$ = y$ + cw$
IF y <> lun THEN PRINT "Disrupt!"
lun = y + 1
ELSE
cw$ = lzw(y)
y$ = y$ + cw$
lzw(lun) = pw$ + LEFT$(cw$, 1)
lun = lun + 1
END IF
ELSE
IF lzw(y) = "" THEN
y$ = y$ + cw$
ELSE
cw$ = lzw(y)
y$ = y$ + cw$
END IF
END IF
END IF
LOOP WHILE p < LEN(z$)
IF y = 257 THEN PRINT "done" ELSE PRINT "out of data."
OPEN folder + LEFT$(lfn, 4) + extq FOR OUTPUT AS #9
PRINT #9, y$;
CLOSE #9
END SUB
Visual Basic .NET
This implementation uses high-level elements such as lambdas, anonymous arrays, and strict types. It must be compiled for the Microsoft .NET Framework v4.5 in Visual Studio 2012. It has the advantages of running in non-DosBox Windows and using .NET streams for simple reusability. It can also be used for higher-bit LZW.
Decompression
Decompressing is very fast.
Sub DecompressLZW(Data As IO.Stream, MaxBits As Byte, Output As IO.Stream)
' This source is by Fleexy and is in the public domain. If used, please note it as such.
Dim dict As New List(Of Byte())
For x = 0 To 255
dict.Add({x})
Next
dict.Add({})
dict.Add(Nothing)
Dim usebits As Byte = 9
Dim bpos As Long
Dim bits As New List(Of Byte)
Do Until Data.Position = Data.Length
Dim b, ub As Byte
b = Data.ReadByte
ub = b
For x = 7 To 0 Step -1
If ub - (2 ^ x) >= 0 Then
ub -= (2 ^ x)
bits.Add(1)
Else
bits.Add(0)
End If
Next
Loop
Dim GetCode = Function() As UInteger
Dim u As UInteger
For x = usebits To 1 Step -1
If bits(bpos) = 1 Then u += (2 ^ (x - 1))
bpos += 1
Next
Return u
End Function
Dim OutputCode = Sub(DecompData As Byte())
Dim n As UInteger = DecompData.Length
Output.Write(DecompData, 0, n)
End Sub
Dim AddToDict = Sub(Entry As Byte())
If dict.Count < (2 ^ MaxBits) Then
dict.Add(Entry)
If dict.Count = (2 ^ usebits) - 1 Then usebits = Math.Min(usebits + 1, MaxBits)
End If
End Sub
Dim fcode As UInteger = GetCode()
Dim match As Byte() = dict(fcode)
OutputCode(match)
Do
Dim ncode As UInteger = GetCode()
If ncode = 257 Then Exit Do
If ncode = 256 Then Throw New Exception
Dim nmatch As Byte()
If ncode < dict.Count Then
nmatch = dict(ncode)
Else
nmatch = match.Concat({match(0)}).ToArray
End If
OutputCode(nmatch)
AddToDict(match.Concat({nmatch(0)}).ToArray)
match = nmatch
Loop
End Sub
Compression
Compression is more difficult; consulting the dictionary for a byte array takes more time. The speed of this algorithm may be unacceptable.
Sub CompressLZW(Data As IO.Stream, MaxBits As Byte, Output As IO.Stream)
' This source is by Fleexy and is in the public domain.
Dim bits As New List(Of Byte)
Dim dict As New List(Of Byte())
For x = 0 To 255
dict.Add({x})
Next
dict.Add({})
dict.Add({})
Dim usebits As Byte = 9
Dim PutCode = Sub(Code As UInteger)
For x = usebits To 1 Step -1
If Code - (2 ^ (x - 1)) >= 0 Then
Code -= (2 ^ (x - 1))
bits.Add(1)
Else
bits.Add(0)
End If
Next
End Sub
Dim AddToDict = Function(Entry As Byte()) As Boolean
If dict.Count < 2 ^ MaxBits Then
dict.Add(Entry)
If dict.Count = 2 ^ usebits Then usebits = Math.Min(usebits + 1, MaxBits)
Return True
Else
Return False
End If
End Function
Dim FindCode = Function(Bytes As Byte()) As UInteger
For x = 1 To dict.Count
If dict(x - 1).Count = Bytes.Count Then
If dict(x - 1).SequenceEqual(Bytes) Then Return x - 1
End If
Next
Throw New NotFiniteNumberException
End Function
Dim DictContains = Function(Bytes As Byte()) As Boolean
For x = 1 To dict.Count
If dict(x - 1).Length = Bytes.Length Then
If dict(x - 1).SequenceEqual(Bytes) Then Return True
End If
Next
Return False
End Function
Dim match As Byte() = {}
Do Until Data.Position = Data.Length
Dim nbyte As Byte = Data.ReadByte
Dim nmatch As Byte() = match.Concat({nbyte}).ToArray
If DictContains(nmatch) Then
match = nmatch
Else
PutCode(FindCode(match))
AddToDict(nmatch)
match = {nbyte}
End If
Loop
PutCode(FindCode(match))
PutCode(257)
Do Until bits.LongCount Mod 8L = 0L
bits.Add(0)
Loop
For x = 1 To CInt(bits.LongCount / 8L)
Dim b As Byte = 0
For y = 0 To 7
b += bits((x - 1) * 8 + y) * (2 ^ (7 - y))
Next
Output.WriteByte(b)
Next
End Sub