A bit-level text compression scheme based on the ACW algorithm

This paper presents a description and performance evaluation of a new bit-level, lossless, adaptive, and asymmetric data compression scheme that is based on the adaptive character wordlength (ACW(n)) algorithm. The proposed scheme enhances the compression ratio of the ACW(n) algorithm by dividing the binary sequence into a number of subsequences (s), each of them satisfying the condition that the number of decimal values (d) of the n-bit length characters is equal to or less than 256. Therefore, the new scheme is referred to as ACW(n, s), where n is the adaptive character wordlength and s is the number of subsequences. The new scheme was used to compress a number of text files from standard corpora. The obtained results demonstrate that the ACW(n, s) scheme achieves higher compression ratio than many widely used compression algorithms and it achieves a competitive performance compared to state-of-the-art compression tools.

[1]  S. Verdú,et al.  Noiseless Data Compression with Low-Density Parity-Check Codes , 2003, Advances in Network Information Theory.

[2]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[3]  Hussein Al-Bahadili,et al.  An adaptive character wordlength algorithm for data compression , 2008, Comput. Math. Appl..

[4]  Michal Zemlicka,et al.  Text Compression: Syllables , 2005, DATESO.

[5]  Timothy C. Bell,et al.  A corpus for the evaluation of lossless compression algorithms , 1997, Proceedings DCC '97. Data Compression Conference.

[6]  Matthew V. Mahoney,et al.  Fast Text Compression with Neural Networks , 2000, FLAIRS Conference.

[7]  Chen Wang,et al.  Formal photograph compression algorithm based on object segmentation , 2008, Int. J. Autom. Comput..

[8]  Václav Snásel,et al.  Word-based compression methods for large text documents , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[9]  Jeffrey Scott Vitter,et al.  Design and analysis of dynamic Huffman coding , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[10]  Alistair Moffat,et al.  Word-based text compression using the Burrows-Wheeler transform , 2005, Inf. Process. Manag..

[11]  Hussein Al-Bahadili,et al.  A novel lossless data compression scheme based on the error correcting Hamming codes , 2008, Comput. Math. Appl..

[12]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[13]  Pamela C. Cosman,et al.  Dictionary design for text image compression with JBIG2 , 2001, IEEE Trans. Image Process..

[14]  Joaquín Adiego,et al.  On the use of words as source alphabet symbols in PPM , 2006, Data Compression Conference (DCC'06).

[15]  Michal Zemlicka,et al.  Compression of a Dictionary , 2006, DATESO.

[16]  Ying Weng,et al.  Real-time and automatic close-up retrieval from compressed videos , 2008, Int. J. Autom. Comput..

[17]  Khalid Sayood,et al.  Introduction to Data Compression , 1996 .

[18]  B. John Oommen,et al.  A fast and efficient nearly-optimal adaptive Fano coding scheme , 2006, Inf. Sci..

[19]  Jeffrey Scott Vitter,et al.  Design and analysis of dynamic Huffman codes , 1987, JACM.