Notes

Chapter 10: Processes of Perception and Analysis

Section 5: Data Compression

Huffman coding

From a list p of probabilities for blocks, the list of codewords can be generated using

Map[Drop[Last[#], -1] &, Sort[Flatten[MapIndexed[Rule, FixedPoint[Replace[Sort[#], {{p0_, i0_}, {p1_, i1_}, pi___}  {{p0 + p1, {i0, i1}}, pi}] & , MapIndexed[List, p]]〚1, 2〛, {-1}]]]] -1

Given the list of codewords c, the sequence of blocks that occur in encoded data d can be uniquely reconstructed using

First[{{}, d} //. MapIndexed[{{r___}, Flatten[{#1, s___}]}  {{r,#2〚1〛},{s}} &, c]]

Note that the encoded data can consist of any sequence of 0's and 1's. If all 2^b possible blocks of length b occur with equal probability, then the Huffman codewords will consist of blocks equivalent to the original ones. In an opposite extreme, blocks with probabilities 1/2, 1/4, 1/8, ... will yield codewords of lengths 1, 2, 3, ...

In practical applications, Huffman coding is sometimes extended to allow the choice of codewords to be updated dynamically as more data is read.