Compression of small text files

This paper suggests a novel compression scheme for small text files. The proposed scheme depends on Boolean minimization of binary data accompanied with the adoption of Burrows-Wheeler transformation (BWT) algorithm. Compression of small text files must fulfil special requirements since they have small context. The use of Boolean minimization and Burrows-Wheeler transformation generate better context information for compression with standard algorithms. We tested the suggested scheme on collections of small and medium-sized files. The testing results showed that proposed scheme improve the compression ratio over other existing methods.

[1]  Mark Nelson,et al.  The Data Compression Book , 2009 .

[2]  Timothy C. Bell,et al.  A corpus for the evaluation of lossless compression algorithms , 1997, Proceedings DCC '97. Data Compression Conference.

[3]  Mark Nelson,et al.  The data compression book (2nd ed.) , 1995 .

[4]  Demetrius Zissos Logic design algorithms , 1972 .

[5]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[6]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[7]  G.G. Langdon,et al.  Data compression , 1988, IEEE Potentials.

[8]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[9]  Jeffrey Scott Vitter,et al.  Algorithm 673: Dynamic Huffman coding , 1989, TOMS.

[10]  Ricardo A. Baeza-Yates,et al.  Compression: A Key for Next-Generation Text Retrieval Systems , 2000, Computer.

[11]  Gary Jason Mathews Selecting a general-purpose data compression algorithm , 1995 .

[12]  Václav Snásel,et al.  Word-Based Compression Methods and Indexing for Text Retrieval Systems , 1999, ADBIS.

[13]  Mark R. Nelson,et al.  LZW data compression , 1989 .

[14]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[15]  En-Hui Yang,et al.  Simple universal lossy data compression schemes derived from the Lempel-Ziv algorithm , 1996, IEEE Trans. Inf. Theory.

[16]  Eyas El Qawasmeh,et al.  Development and investigation of a new compression technique using Boolean minimizations , 2009, 2009 Second International Conference on the Applications of Digital Information and Web Technologies.

[17]  Matthew V. Mahoney,et al.  Adaptive weighing of context models for lossless data compression , 2005 .

[18]  Tomás Lang,et al.  Introduction to Digital Systems , 1998 .

[19]  Jan Lansky,et al.  Comparison of Text Models for BWT , 2007, 2007 Data Compression Conference (DCC'07).

[20]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[21]  R. Nigel Horspool,et al.  Constructing word-based text compression algorithms , 1992, Data Compression Conference, 1992..

[22]  Ghadah Fadil Shatnawi Development and investigation of a compression technique using boolean minimization , 2008 .

[23]  Michal Zemlicka,et al.  Compression of small text files using syllables , 2006, Data Compression Conference (DCC'06).