Notes

Chapter 10: Processes of Perception and Analysis

Section 5: Data Compression

Maximal block compression

If one has data that consists of a long sequence of blocks, each of length b, and each independently chosen with probability p[i] to be of type i, then as argued by Claude Shannon in the late 1940s, it turns out that the minimum number of base 2 bits needed on average to represent each block in such a sequence is h = -Sum[p[i] Log[2, p[i]], {i, 2^b}]. If all blocks occur with an equal probability of 2^-b, then h takes on its maximum possible value of b. If only one block occurs with nonzero probability then h  0. Following Shannon, the quantity h (whose form is analogous to entropy in physics, as discussed on page 1020) is often referred to as "information content". This name, however, is very misleading. For certainly h does not in general give the length of the shortest possible description of the data; all it does is to give the shortest length of description that is obtained by treating successive blocks as if they occur with independent probabilities. With this assumption one then finds that maximal compression occurs if a block of probability p[i] is represented by a codeword of length -Log[2, p[i]]. Huffman coding with a large number of codewords will approach this if all the p[i] are powers of 1/2. (The self-delimiting of codewords leads to deviations for small numbers of codewords.) For p[i] that are not powers of 1/2, non-integer length codewords would be required. The method of arithmetic coding provides an alternative in which the output does not consist of separate codewords concatenated together. (Compare algorithmic information content discussed on pages 554 and 1067.)