Towards the development of an automatic readability measurements for arabic language

Currently, there are more than 200 readability formulas developed since the 1920s. Only a handful of these formulas are reliable to determine the reading-level of a sample text. Ascertaining the readability of curricula is an important step toward optimizing the effectiveness of the educational progress. Readability measurements are done using manual computation. This is a tedious and time-consuming task. However, nowadays, automatic readability computation has gained much popularity than ever before. This can be attributed to the advancement in the field of natural language processing. Languages such as English and Spanish have benefited from the formation of automatic readability systems. Absence of automated readability measurement of Arabic texts and the large amount of information which are written in Arabic encouraged us to work on finding an automatic readability system for our language; the mother language for millions of Arabs. In this paper we will review the different readability research and then propose a system for automating the readability measurement of Arabic text. Within the paper we also report the results of our pilot experiment carried out on the different well-known Arabic, Swedish and English readability formulas.

[1]  Saudi Arabia. Mulḥaqīyah al-Thaqāfīyah,et al.  Education in Saudi Arabia , 1995 .

[2]  Sven Hartrumpf,et al.  An Architecture for Rating and Controlling Text Readability , 2006 .

[3]  Fumito Masui,et al.  A method for rating English texts by reading level for Japanese learners of English , 2005, Systems and Computers in Japan.

[4]  Kevyn Collins-Thompson,et al.  A Language Modeling Approach to Predicting Reading Difficulty , 2004, NAACL.

[5]  Kevyn Collins-Thompson,et al.  Predicting reading difficulty with statistical language models , 2005, J. Assoc. Inf. Sci. Technol..

[6]  Patrik Larsson,et al.  Classification into Readability Levels : Implementation and Evaluation , 2006 .

[7]  Mari Ostendorf,et al.  A machine learning approach to reading level assessment , 2009, Comput. Speech Lang..

[8]  W. Bruce Croft,et al.  Automatic recognition of reading levels from user queries , 2004, SIGIR '04.

[9]  William H. DuBay The Principles of Readability. , 2004 .

[10]  Yuka Tateisi,et al.  A computer readability formula of Japanese texts for machine scoring , 1988, COLING.

[11]  Yunli Wang,et al.  Automatic Recognition of Text Difficulty from Consumers Health Information , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[12]  Tateisi Yuka,et al.  A computer readability formula of Japanese texts for machine scoring , 1988, COLING 1988.

[13]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[14]  Luo Si,et al.  A statistical model for scientific readability , 2001, CIKM '01.

[15]  Colin Harrison,et al.  Readability in the Classroom , 1980 .