TEXT DATABASE COMPRESSION USING REPLACEMENT AND BIT REDUCTION

This is a simple compression algorithm which is based on repetition of words and number system theory as well. It employs a technique in which frequently occurring words are replaced by special characters and the modified file is considered as n-base number system, where n is the number of different characters in the file. Further, compression process is carried out by converting this n-base number system to binary number system. The main idea behind using this algorithm is to represent the whole data into lower number system thereby saving bits requirement. It is a simple compression and decompression technique which can be widely used on database as database contains frequently occurring words like last name etc.

[1]  Gonzalo Navarro,et al.  A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text , 1999, CPM.

[2]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[3]  U. Raghavendra,et al.  A VLSI architecture for cellular automata based parallel data compression , 1996, Proceedings of 9th International Conference on VLSI Design.

[4]  David Salomon,et al.  Data Compression: The Complete Reference , 2006 .

[5]  Paolo Ferragina,et al.  Text Compression , 2009, Encyclopedia of Database Systems.

[6]  G. Blelloch Introduction to Data Compression * , 2022 .

[7]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[8]  Anil Kumar Gupta,et al.  Efficient data compression using character replacement through generated code , 2011, 2011 2nd National Conference on Emerging Trends and Applications in Computer Science.

[9]  Khalid Sayood,et al.  Introduction to Data Compression , 1996 .

[10]  Mark Nelson,et al.  The Data Compression Book , 2009 .

[11]  Ayan Banerjee,et al.  An efficient dynamic image compression algorithm based on block optimization, byte compression and run-length encoding along Y-axis , 2010, 2010 3rd International Conference on Computer Science and Information Technology.