Skeleton Trees for the Efficient Decoding of Huffman Encoded Texts

A new data structure is investigated, which allows fast decoding of texts encoded by canonical Huffman codes. The storage requirements are much lower than for conventional Huffman trees, O(log2 n) for trees of depth O(log n), and decoding is faster, because a part of the bit-comparisons necessary for the decoding may be saved. Empirical results on large real-life distributions show a reduction of up to 50% and more in the number of bit operations. The basic idea is then generalized, yielding further savings.

[1]  Shmuel Tomi Klein,et al.  Bidirectional Huffman Coding , 1990, Comput. J..

[2]  Alistair Moffat,et al.  Text Compression for Dynamic Document Databases , 1997, IEEE Trans. Knowl. Data Eng..

[3]  Alistair Moffat,et al.  In Situ Generation of Compressed Inverted Files , 1995, J. Am. Soc. Inf. Sci..

[4]  Eugene S. Schwartz,et al.  Generating a canonical prefix encoding , 1964, CACM.

[5]  Shmuel Tomi Klein,et al.  Efficient variants of Huffman codes in high level languages , 1985, SIGIR '85.

[6]  Shmuel Tomi Klein,et al.  Bounding the Depth of Search Trees , 1993, Comput. J..

[7]  Donald E. Knuth,et al.  The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition , 1997 .

[8]  Shmuel Tomi Klein,et al.  Compression, information theory, and grammars: a unified approach , 1990, TOIS.

[9]  Alistair Moffat,et al.  In Situ Generation of Compressed Inverted Files , 1995, J. Am. Soc. Inf. Sci..

[10]  Thomas J. Ferguson,et al.  Self-synchronizing Huffman codes , 1984, IEEE Trans. Inf. Theory.

[11]  Gyula O. H. Katona,et al.  Huffman codes and self-information , 1976, IEEE Trans. Inf. Theory.

[12]  Daniel S. Hirschberg,et al.  Efficient decoding of prefix codes , 1990, CACM.

[13]  Ian H. Witten,et al.  Data Compression in Full-Text Retrieval Systems , 1993, J. Am. Soc. Inf. Sci..

[14]  Alistair Moffat,et al.  On the implementation of minimum-redundancy prefix codes , 1996, Proceedings of Data Compression Conference - DCC '96.

[15]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[16]  H. S. Heaps,et al.  Information retrieval, computational and theoretical aspects , 1978 .

[17]  Andrzej Sieminski,et al.  Fast Decoding of the Huffman Codes , 1988, Inf. Process. Lett..

[18]  Shmuel Tomi Klein,et al.  Is Huffman coding dead? , 1993, Computing.

[19]  Shmuel Tomi Klein,et al.  A Systematic Approach to Compressing a Full-Text Retrieval System , 1992, Inf. Process. Manag..

[20]  G. Zipf,et al.  The Psycho-Biology of Language , 1936 .

[21]  Giuseppe Longo,et al.  An application of informational divergence to Huffman codes , 1982, IEEE Trans. Inf. Theory.

[22]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[23]  Shmuel Tomi Klein,et al.  Storing text retrieval systems on CD-ROM: compression and encryption considerations , 1989, SIGIR '89.

[24]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[25]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[26]  Ricardo A. Baeza-Yates,et al.  Fast searching on compressed text allowing errors , 1998, SIGIR '98.

[27]  Alistair Moffat,et al.  Space-efficient construction of optimal prefix codes , 1995, Proceedings DCC '95 Data Compression Conference.

[28]  Daniel S. Hirschberg,et al.  Data compression , 1987, CSUR.

[29]  E. F. Moore,et al.  Variable-length binary encodings , 1959 .

[30]  Alistair Moffat,et al.  Adding compression to a full‐text retrieval system , 1995, Softw. Pract. Exp..

[31]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[32]  Thomas J. Ferguson,et al.  Self - Synchronization Huffman Codes , 1984 .