Fazlalıktan Yararlanarak Kayıplı Metin Sıkıştırma Gerçekleştirimi

Regardless of the source language, text documents contain significant amount of redundancy. Data compression exploits this redundancy to improve transmission efficiency and/or save storage space. Conventionally, various lossless text compression algorithms have been introduced for critical applications, where any loss after recovery is intolerable. For non-critical applications, i.e. where data loss to some extent is acceptable, one may employ lossy compression to acquire superior efficiency. We use three recent techniques to achieve character-oriented lossy text compression: Letter Mapping (LM), Dropped Vowels (DV), and Replacement of Characters (RC), and use them as a front end anticipating to improve compression performance of conventional compression algorithms. We implement the scheme on English and Turkish sample texts and compare the results. Additionally, we include performance improvement rates for these models when used as a front end to Huffman and Arithmetic Coding algorithms. As for the future work, we propose several ideas to further improve the current performance of each model.

[1]  Changsong Liu,et al.  JBIG2 text image compression based on OCR , 2006, Electronic Imaging.

[2]  R. Treiman,et al.  Vowels, syllables, and letter names: differences between young children's spelling in English and Portuguese. , 2005, Journal of experimental child psychology.

[3]  Shmuel Tomi Klein,et al.  Semi-Lossless Text Compression , 2004, Stringology.

[4]  Ranjan Bose,et al.  A novel compression and encryption scheme using variable model arithmetic coding and coupled chaotic system , 2006, IEEE Transactions on Circuits and Systems I: Regular Papers.

[5]  Guizhong Liu,et al.  An Efficient Compression Algorithm for Hyperspectral Images Based on Correlation Coefficients Adaptive Three Dimensional Wavelet Zerotree Coding , 2007, 2007 IEEE International Conference on Image Processing.

[6]  Shahram Latifi,et al.  Lossy Text Compression Techniques , 2007 .

[7]  G. Korodi,et al.  On improving the PPM algorithm , 2008, 2008 3rd International Symposium on Communications, Control and Signal Processing.

[8]  Hong Kook Kim,et al.  Class-dependent and differential Huffman coding of compressed feature parameters for distributed speech recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Pamela C. Cosman,et al.  Dictionary design for text image compression with JBIG2 , 2001, IEEE Trans. Image Process..

[10]  Jae Hee Lee,et al.  Contribution of consonant versus vowel information to sentence intelligibility for young normal-hearing and elderly hearing-impaired listeners. , 2007, The Journal of the Acoustical Society of America.

[11]  Harold W. Thimbleby,et al.  Semantic and Generative Models for Lossy Text Compression , 1994, Comput. J..

[12]  P. Jorgensen,et al.  Analysis of Fractals, Image Compression, Entropy Encoding, Karhunen-Loève Transforms , 2009 .

[13]  Ping Chen,et al.  Generalized Discrete Cosine Transform , 2009, 2009 Pacific-Asia Conference on Circuits, Communications and Systems.

[14]  Pamela C. Cosman,et al.  Fast and memory efficient text image compression with JBIG2 , 2003, IEEE Trans. Image Process..

[15]  Piyush Kumar Shukla,et al.  Multiple Subgroup Data Compression Technique Based on Huffman Coding , 2009, 2009 First International Conference on Computational Intelligence, Communication Systems and Networks.

[16]  Ravi Sankar,et al.  Efficient implementation of linear predictive coding algorithms , 1998, Proceedings IEEE Southeastcon '98 'Engineering for a New Era'.

[17]  Sarbani Palit,et al.  A novel technique for the watermarking of symbolically compressed documents , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[18]  Andrei Z. Broder,et al.  Pattern-based compression of text images , 1996, Proceedings of Data Compression Conference - DCC '96.

[19]  P. Vary,et al.  On logarithmic spherical vector quantization , 2008, 2008 International Symposium on Information Theory and Its Applications.

[20]  Shahram Latifi,et al.  EXPLOITING REDUNDANCY TO ACHIEVE LOSSY TEXT COMPRESSION , 2011 .

[21]  Thomas S. Huang,et al.  Image processing , 1971 .

[22]  Timothy C. Bell,et al.  Compression of Parallel Texts , 1992, Inf. Process. Manag..

[23]  Jeff Gilchrist,et al.  Parallel Lossless Data Compression Based on the Burrows-Wheeler Transform , 2007, 21st International Conference on Advanced Information Networking and Applications (AINA '07).

[24]  R. Lewand Cryptological Mathematics , 2000 .

[25]  P.G. Howard Lossless and lossy compression of text images by soft pattern matching , 1996, Proceedings of Data Compression Conference - DCC '96.