OSMAN ― A Novel Arabic Readability Metric

We present OSMAN (Open Source Metric for Measuring Arabic Narratives) - a novel open source Arabic readability metric and tool. It allows researchers to calculate readability for Arabic text with and without diacritics. OSMAN is a modified version of the conventional readability formulas such as Flesch and Fog. In our work we introduce a novel approach towards counting short, long and stress syllables in Arabic which is essential for judging readability of Arabic narratives. We also introduce an additional factor called “Faseeh” which considers aspects of script usually dropped in informal Arabic writing. To evaluate our methods we used Spearman’s correlation metric to compare text readability for 73,000 parallel sentences from English and Arabic UN documents. The Arabic sentences were written with the absence of diacritics and in order to count the number of syllables we added the diacritics in using an open source tool called Mishkal. The results show that OSMAN readability formula correlates well with the English ones making it a useful tool for researchers and educators working with Arabic text.

[1]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[2]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .

[3]  Mahmoud El-Haj,et al.  Language Independent Evaluation of Translation Style and Consistency: Comparing Human and Machine Translations of Camus' Novel "The Stranger" , 2014, TSD.

[4]  Abdel Karim Al Tamimi,et al.  AARI: automatic arabic readability index , 2014, Int. Arab J. Inf. Technol..

[5]  Hend Suliman Al-Khalifa,et al.  AUTOMATIC READABILITY MEASUREMENTS OF THE ARABIC TEXT: AN EXPLORATORY STUDY , 2010 .

[6]  J. Chall,et al.  A FORMULA FOR PREDICTING READABILITY , 1948 .

[7]  Hend Suliman Al-Khalifa,et al.  Towards the development of an automatic readability measurements for arabic language , 2008, 2008 Third International Conference on Digital Information Management.

[8]  Udo Kruschwitz,et al.  Creating language resources for under-resourced languages: methodologies, and experiments with Arabic , 2015, Lang. Resour. Evaluation.

[9]  Hend S. Al-Khalifa,et al.  A first approach to the evaluation of arabic diacritization systems , 2012, Seventh International Conference on Digital Information Management (ICDIM 2012).

[10]  Aqil M. Azmi,et al.  A survey of automatic Arabic diacritization techniques , 2013, Natural Language Engineering.

[11]  Robert Dale,et al.  United Nations General Assembly Resolutions : a six-language parallel corpus , 2009 .

[12]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.