Simple Rules for Syllabification of Arabic Texts

The Arabic language is the sixth most used language in the world today. It is also used by United Nation. Moreover, the Arabic alphabet is the second most widely used alphabet around the world. Therefore, the computer processing of Arabic language or Arabic alphabet is more and more important task. In the past, several books about analyzing of the Arabic language were published. But the language analysis is only one step in the language processing. Several approaches to the text compression were developed in the field of text compression. The first and most intuitive is character based compression which is suitable for small files. Another approach called word-based compression become very suitable for very long files. The third approach is called syllable-based, it use syllable as basic element. Algorithms for the syllabification of the English, German or other European language are well known, but syllabification algorithms for Arabic and their usage in text compression has not been deeply investigated. This paper describes a new and very simple algorithm for syllabification of Arabic and its usage in text compression.

[1]  J. Platos,et al.  Using clustering to improve WLZ77 compression , 2008, 2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT).

[2]  Matthew V. Mahoney,et al.  Fast Text Compression with Neural Networks , 2000, FLAIRS Conference.

[3]  Szymon Grabowski,et al.  Variable-length contexts for PPM , 2004, Data Compression Conference, 2004. Proceedings. DCC 2004.

[4]  R. Nigel Horspool,et al.  Constructing word-based text compression algorithms , 1992, Data Compression Conference, 1992..

[5]  Glen G. Langdon,et al.  Universal modeling and coding , 1981, IEEE Trans. Inf. Theory.

[6]  Szymon Grabowski,et al.  Revisiting dictionary‐based compression , 2005, Softw. Pract. Exp..

[7]  Robert D. Cameron Source encoding using syntactic information source models , 1988, IEEE Trans. Inf. Theory.

[8]  Alistair Moffat,et al.  Word-based text compression using the Burrows-Wheeler transform , 2005, Inf. Process. Manag..

[9]  Tim Buckwalter Issues in Arabic Morphological Analysis , 2007 .

[10]  Bill Manaris,et al.  Proceedings of the Thirteenth International Florida Artificial Intelligence Research Society Conference, May 22-24, 2000, Orlando, Florida, USA , 2000, FLAIRS Conference.

[11]  Ian H. Witten,et al.  Managing gigabytes (2nd ed.): compressing and indexing documents and images , 1999 .

[12]  Andrew M. Gillies,et al.  Arabic Text Recognition System , 2007 .

[13]  R. Nigel Horspool,et al.  Data Compression Using Dynamic Markov Modelling , 1987, Comput. J..

[14]  Jan Lansky,et al.  Comparison of Text Models for BWT , 2007, 2007 Data Compression Conference (DCC'07).

[15]  Sebastian Deorowicz,et al.  Revisiting dictionary-based compression: Research Articles , 2005 .

[16]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[17]  Ian H. Witten,et al.  Modeling for text compression , 1989, CSUR.

[18]  Dmitry A. Shkarin,et al.  PPM: one step to practicality , 2002, Proceedings DCC 2002. Data Compression Conference.

[19]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[20]  Jan Lansky,et al.  Genetic Algorithms in Syllable-Based Text Compression , 2007, DATESO.

[21]  Miodrag Potkonjak,et al.  PPM model cleaning , 2003, Data Compression Conference, 2003. Proceedings. DCC 2003.

[22]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[23]  Michal Zemlicka,et al.  Compression of small text files using syllables , 2006, Data Compression Conference (DCC'06).

[24]  Daniel S. Hirschberg,et al.  Streamlining context models for data compression , 1991, [1991] Proceedings. Data Compression Conference.

[25]  Jan Platos,et al.  Word-Based Text Compression , 2008, ArXiv.

[26]  Václav Snásel,et al.  Word-Based Compression Methods and Indexing for Text Retrieval Systems , 1999, ADBIS.

[27]  Martti Penttonen,et al.  Syntax‐directed compression of program files , 1986, Softw. Pract. Exp..

[28]  John G. Cleary,et al.  Unbounded length contexts for PPM , 1995, Proceedings DCC '95 Data Compression Conference.