A fast and efficient nearly-optimal adaptive Fano coding scheme

Adaptive coding techniques have been increasingly used in lossless data compression. They are suitable for a wide range of applications, in which on-line compression is required, including communications, internet, e-mail, and e-commerce. In this paper, we present an adaptive Fano coding method applicable to binary and multi-symbol code alphabets. We introduce the corresponding partitioning procedure that deals with consecutive partitionings, and that possesses, what we have called, the nearly-equal-probability property, i.e. that satisfy the principles of Fano coding. To determine the optimal partitioning, we propose a brute-force algorithm that searches the entire space of all possible partitionings. We show that this algorithm operates in polynomial-time complexity on the size of the input alphabet, where the degree of the polynomial is given by the size of the output alphabet. As opposed to this, we also propose a greedy algorithm that quickly finds a sub-optimal, but accurate, consecutive partitioning. The empirical results on real-life benchmark data files demonstrate that our scheme compresses and decompresses faster than adaptive Huffman coding, while consuming less memory resources.

[1]  B OommenJohn,et al.  Advances in data compression and pattern recognition , 2002 .

[2]  Robert G. Gallager,et al.  Variations on a theme by Huffman , 1978, IEEE Trans. Inf. Theory.

[3]  B. John Oommen,et al.  A nearly-optimal Fano-based coding algorithm , 2004, Inf. Process. Manag..

[4]  Philippe Jacquet,et al.  A universal predictor based on pattern matching , 2002, IEEE Trans. Inf. Theory.

[5]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[6]  Alistair Moffat An improved data structure for cumulative probability tables , 1999 .

[7]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[8]  Donald E. Knuth,et al.  Dynamic Huffman Coding , 1985, J. Algorithms.

[9]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[10]  Michelle Effros,et al.  Universal lossless source coding with the Burrows Wheeler transform , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[11]  Erik Ordentlich,et al.  On delayed prediction of individual sequences , 2002, IEEE Trans. Inf. Theory.

[12]  M. S. Pinho,et al.  Context-based LZW encoder , 2002 .

[13]  En-Hui Yang,et al.  Grammar-based codes: A new class of universal lossless source codes , 2000, IEEE Trans. Inf. Theory.

[14]  Jun Muramatsu On the performance of recency-rank and block-sorting universal lossless data compression algorithms , 2002, IEEE Trans. Inf. Theory.

[15]  James A. Storer,et al.  Data Compression: Methods and Theory , 1987 .

[16]  Cunsheng Ding,et al.  Two classes of ternary codes and their weight distributions , 2001, Discret. Appl. Math..

[17]  Timothy C. Bell,et al.  A corpus for the evaluation of lossless compression algorithms , 1997, Proceedings DCC '97. Data Compression Conference.

[18]  G. Blelloch Introduction to Data Compression * , 2022 .

[19]  Kenneth H. Rosen,et al.  Discrete Mathematics and its applications , 2000 .

[20]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[21]  Dmitry A. Shkarin,et al.  PPM: one step to practicality , 2002, Proceedings DCC 2002. Data Compression Conference.

[22]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[23]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[24]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[25]  Jeffrey Scott Vitter,et al.  Design and analysis of dynamic Huffman codes , 1987, JACM.

[26]  Darrel Hankerson,et al.  Introduction to Information Theory and Data Compression , 2003 .

[27]  William John Teahan,et al.  Combining PPM models using a text mining approach , 2001, Proceedings DCC 2001. Data Compression Conference.

[28]  Alistair Moffat An Improved Data Structure for Cumulative Probability Tables , 1999, Softw. Pract. Exp..

[29]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .