An adaptive character wordlength algorithm for data compression

This paper presents a new and efficient data compression algorithm, namely, the adaptive character wordlength (ACW) algorithm, which can be used as complementary algorithm to statistical compression techniques. In such techniques, the characters in the source file are converted to a binary code, where the most common characters in the file have the shortest binary codes, and the least common have the longest; the binary codes are generated based on the estimated probability of the character within the file. Then, the binary coded file is compressed using 8 bits character wordlength. In this new algorithm, an optimum character wordlength, b, is calculated, where b>8, so that the compression ratio is increased by a factor of b/8. In order to validate this algorithm, it is used as a complement algorithm to Huffman code to compress a source file having 10 characters with different probabilities, and these characters are randomly distributed within the source file. The results obtained and the factors that affect the optimum value of b are discussed, and, finally, conclusions are presented.

[1]  Donald E. Knuth,et al.  Dynamic Huffman Coding , 1985, J. Algorithms.

[2]  Steven W. Smith,et al.  The Scientist and Engineer's Guide to Digital Signal Processing , 1997 .

[3]  Daniel S. Hirschberg,et al.  Data compression , 1987, CSUR.

[4]  Jeffrey Scott Vitter,et al.  Algorithm 673: Dynamic Huffman coding , 1989, TOMS.

[5]  Gilbert Held,et al.  Data and image compression (4th ed.): tools and techniques , 1996 .

[6]  Mark R. Nelson,et al.  LZW data compression , 1989 .

[7]  Jeffrey Scott Vitter,et al.  Design and analysis of dynamic Huffman coding , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[8]  Jeffrey Scott Vitter,et al.  Design and analysis of dynamic Huffman codes , 1987, JACM.

[9]  Chin-Chen Chang,et al.  A new lossless compression scheme based on Huffman coding scheme for image compression , 2000, Signal Process. Image Commun..

[10]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[11]  B. John Oommen,et al.  A nearly-optimal Fano-based coding algorithm , 2004, Inf. Process. Manag..

[12]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[13]  Khalid Sayood,et al.  Introduction to Data Compression , 1996 .

[14]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[15]  A hybrid system for real-time lossless image compression , 2001, Microprocess. Microsystems.

[16]  K. Hake,et al.  Survey of data compression techniques , 1991 .

[17]  Ian H. Witten,et al.  Modeling for text compression , 1989, CSUR.

[18]  Jeffrey Scott Vitter,et al.  Arithmetic coding for data compression , 1994 .

[19]  B. John Oommen,et al.  A fast and efficient nearly-optimal adaptive Fano coding scheme , 2006, Inf. Sci..

[20]  Alireza Zolghadr-E-Asli,et al.  An effective method for still image compression/decompression for transmission on PSTN lines based on modifications of Huffman coding , 2004, Comput. Electr. Eng..

[21]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.