A Comparison between English and Arabic Text Compression

A Comparison between applying two Techniques that compress document data in both languages Arabic and English is introduced. In order to compress the data document, two or more constituent's data documents in both languages are identified. The comparison takes to its consideration, for the first time to the best of our knowledge, the Arabic data compressing. The problem is solved using an efficient language that uses Borland C++ builder to ensure compression for any documents. Our numerical experiments show that Huffman technique can be better used for Arabic Documents. LZW algorithm is better to use for TIFF, GIF and English textual files.

[1]  E. Y. Hamid,et al.  Wavelet-based data compression of power system disturbances using the minimum description length criterion , 2001 .

[2]  Carla E. Brodley,et al.  Compression and machine learning: a new perspective on feature space vectors , 2006, Data Compression Conference (DCC'06).

[3]  Clifford J. Goosmann Data Compression In A Mainframe World (Less Is More) , 1995, Int. CMG Conference.

[4]  David Salomon,et al.  Data Compression , 2000, Springer Berlin Heidelberg.

[5]  Mohamed S. Abdel-Wahab,et al.  An Intelligent System For Arabic Text Categorization , 2006 .

[6]  Mark Nelson,et al.  The Data Compression Book, 2nd Edition , 1996 .

[7]  D. Salomon A Guide to Data Compression Methods , 2002, Springer New York.

[8]  Zhiyuan Li,et al.  Configuration Compression for Virtex FPGAs , 2001, The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01).

[9]  Mehrdad Nourani,et al.  RL-huffman encoding for test compression and power reduction in scan applications , 2005, TODE.

[10]  Michael Deering,et al.  Geometry compression , 1995, SIGGRAPH.

[11]  Václav Snásel,et al.  Word-Based Compression Methods and Indexing for Text Retrieval Systems , 1999, ADBIS.

[13]  Christos A. Papachristou,et al.  Multiscan-based test compression and hardware decompression using LZ77 , 2002, Proceedings. International Test Conference.

[14]  David Salomon,et al.  A Concise Introduction to Data Compression , 2007, Undergraduate Topics in Computer Science.

[15]  Khaled Shaalan,et al.  Arabic Natural Language Processing: Challenges and Solutions , 2009, TALIP.

[16]  Yuan Xie,et al.  LZW-based code compression for VLIW embedded systems , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[17]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.