Talk:Huffman Compression

From ModdingWiki
Jump to navigation Jump to search

Constructing an optimal tree

Wikipedia documents a different way to construct an optimal tree: Sort the list of elements by frequency. Remove the two elements with the lowest frequency from the list. Create a new tree for these two elements, add the frequencies together and put the tree back in the sorted list of elements. Repeat until you have 1 element in the list. Following this algorithm, I've constructed the following tree for the "She sells seashells by the sea shore." example:

      __[Root]_____
     /             \
  ___|__         ___|___
 /      \     __/_      \
/ \    / \   /    \      s
e _   / \ h  l  /   \
     / \ a     / \ / \
    / \ S      y . r t
    b o

's' takes 2 bits to store, 'e', '_', 'h' an 'l' take 3 bits, 'a' 4 bits, 'S', 'y', '.', 'r' and 't' 5 bits, and 'b' and 'o' take 6 bits. "She sells seashells by the sea shore." takes 122 bits to store. Using the tree on this page it takes 124 bits. So my tree is more "optimal" or is there something wrong with my calculations? Frenkel (talk) 11:13, 10 September 2021 (GMT)

That honestly seems more like a question for a place like StackOverflow or Xentax. Not sure we got many lzw/huffman experts here. I personally never managed to wrap my head around it. -Nyerguds (talk) 11:23, 15 September 2021 (GMT)
Huffman is actually pretty easy, once you've written your own decompressor and compressor. :) Frenkel (talk) 10:42, 17 September 2021 (GMT)
I don't fully understand the description of either process for creating the tree, but it's clear that having the most common character take up one less bit is going to produce a more optimal output, unless all the other characters have a similar frequency (i.e. cutting one bit from character A won't help if characters B and C each require one extra bit, and the total number of B + C characters is more than A.) So possibly you are seeing a more optimal tree because one character is much more frequent than any other? But either way I'd encourage you to add your method to the page, even if it's included as an alternative to what's there. -- Malvineous (talk) 08:45, 6 October 2021 (GMT)
Honestly I'm not surprised people can still optimise these old compression schemes. Even on something as straightforward as RLE I managed to make a few simple logical tweaks that resulted in my compression code giving smaller results than the original files on absolutely every RLE-using game I found so far. I mean, the real test is simply to recompress game files, and see if 1) they end up smaller and 2) the game can still read it. -Nyerguds (talk) 09:10, 6 October 2021 (GMT)