On the implementation of minimum redundancy prefix codes

Minimum redundancy coding (also known as Huffman coding) is one of the enduring techniques of data compression. Many efforts have been made to improve the efficiency of minimum redundancy coding, the majority based on the use of improved representations for explicit Huffman trees. In this paper, we examine how minimum redundancy coding can be implemented efficiently by divorcing coding from a code tree, with emphasis on the situation when n is large, perhaps on the order of 10/sup 6/. We review techniques for devising minimum redundancy codes, and consider in detail how encoding and decoding should be accomplished. In particular, we describe a modified decoding method that allows improved decoding speed, requiring just a few machine operations per output symbol (rather than for each decoded bit), and uses just a few hundred bytes of memory above and beyond the space required to store an enumeration of the source alphabet.

[1]  J. B. Connell,et al.  A Huffman-Shannon-Fano code , 1973 .

[2]  Alistair Moffat,et al.  Text Compression for Dynamic Document Databases , 1997, IEEE Trans. Knowl. Data Eng..

[3]  Alistair Moffat,et al.  In-Place Calculation of Minimum-Redundancy Codes , 1995, WADS.

[4]  Daniel S. Hirschberg,et al.  Data compression , 1987, CSUR.

[5]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[6]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[7]  Jan van Leeuwen,et al.  On the Construction of Huffman Trees , 1976, ICALP.

[8]  M. Hankamer A Modified Huffman Procedure with Reduced Memory Requirement , 1979, IEEE Trans. Commun..

[9]  Alistair Moffat,et al.  Efficient Construction of Minimum-Redundancy Codes for Large Alphabets , 1998, IEEE Trans. Inf. Theory.

[10]  Daniel S. Hirschberg,et al.  Efficient decoding of prefix codes , 1990, CACM.

[11]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[12]  Reza Hashemian High speed search and memory efficient Huffman coding HDTV , 1993, 1993 IEEE International Symposium on Circuits and Systems.

[13]  Hatsukazu Tanaka,et al.  Data structure of Huffman codes and its application to efficient encoding and decoding , 1987, IEEE Trans. Inf. Theory.

[14]  Eugene S. Schwartz,et al.  Generating a canonical prefix encoding , 1964, CACM.

[15]  Shmuel Tomi Klein,et al.  Efficient variants of Huffman codes in high level languages , 1985, SIGIR '85.

[16]  Amar Mukherjee,et al.  Efficient decoding of compressed data , 1995 .

[17]  Reza Hashemian Memory efficient and high-speed search Huffman coding , 1995, IEEE Trans. Commun..

[18]  Alistair Moffat,et al.  Adding compression to a full‐text retrieval system , 1995, Softw. Pract. Exp..

[19]  Ian H. Witten,et al.  Arithmetic coding revisited , 1998, TOIS.

[20]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[21]  Andrzej Sieminski,et al.  Fast Decoding of the Huffman Codes , 1988, Inf. Process. Lett..

[22]  Alistair Moffat,et al.  A Fast and Space - Economical Algorithm for Length - Limited Coding , 1995, ISAAC.

[23]  G.M.J. van Leeuwen,et al.  A flexible algorithm for construction of 3-D vessel networks for use in thermal modeling , 1998, IEEE Transactions on Biomedical Engineering.