Two recursive versions of the Shannon code

For a given memoryless information source, the Huffman code is the optimal prefix-free code in the sense of redundancy. Generally, the length of each codeword in the Huffman code is a function of all symbol probabilities p1, p2, …, pn. In contrast, with the best known suboptimal code, i.e., the Shannon code, the length of the i-th codeword (i.e. [− log pi]) is a function of only pi. In this paper, two recursive versions of the Shannon code (RYY and RSh) are proposed which have redundancy which lies between that of the Huffman code and the Shannon code. In particular, the redundancy is not greater than that of the Shannon code and the i-th codeword length does not depend on pi+1, pi+2, …, pn. In order to evaluate the overall performance of the proposed codes, their redundancy is considered as a random variable on the set of all sources with n symbols. An algorithm for generating random n-tuple distributions is derived and the expected value of the redundancy of the resulting codes is estimated. Recently, it was proven that the average redundancy of the Shannon code is around 0.5 bits. Simulation shows that for n ≫ 20 the average redundancy of the proposed codes are about 0.1 and 0.06, while it is approximately 0.03 for the Huffman code.

[1]  Michael Drmota,et al.  Precise minimax redundancy and regret , 2004, IEEE Transactions on Information Theory.

[2]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[3]  Raymond W. Yeung,et al.  A simple upper bound on the redundancy of Huffman codes , 2002, IEEE Trans. Inf. Theory.

[4]  T.A. Gulliver,et al.  The overall performance of the Shannon code , 2008, 2008 International Symposium on Information Theory and Its Applications.