A Large-Scale Leveled Readability Lexicon for Standard Arabic

We present a large-scale 26,000-lemma leveled readability lexicon for Modern Standard Arabic. The lexicon was manually annotated in triplicate by language professionals from three regions in the Arab world. The annotations show a high degree of agreement; and major differences were limited to regional variations. Comparing lemma readability levels with their frequencies provided good insights in the benefits and pitfalls of frequency-based readability approaches. The lexicon will be publicly available.

[1]  Nizar Habash,et al.  Feature Optimization for Predicting Readability of Arabic L1 and L2 , 2018, NLP-TEA@ACL.

[2]  J. Zipes Sticks and Stones: The Troublesome Success of Children's Literature from Slovenly Peter to Harry Potter , 2000 .

[3]  Adam Kilgarriff,et al.  Corpus-based vocabulary lists for language learners for nine languages , 2014, Lang. Resour. Evaluation.

[4]  E. Fry,et al.  Readability versus Leveling. , 2002 .

[5]  Martin Wynne,et al.  Developing Linguistic Corpora: a Guide to Good Practice , 2005 .

[6]  Yo Ehara Building an English Vocabulary Knowledge Dataset of Japanese English-as-a-Second-Language Learners Using Crowdsourcing , 2018, LREC.

[7]  Nizar Habash,et al.  A Leveled Reading Corpus of Modern Standard Arabic , 2018, LREC.

[8]  Nizar Habash,et al.  Simplification of Arabic Masterpieces for Extensive Reading: A Project Overview , 2017, ACLING.

[9]  Clive Holes,et al.  Modern Arabic: Structures, Functions, and Varieties , 1996 .

[10]  Majed Harb Hanada’s Text Leveling System (HTLS) from Text Engagement to Text Engagingness , 2019, Academic Journal of Interdisciplinary Studies.

[11]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[12]  Wolfgang Teubert,et al.  Corpus Linguistics: A Short Introduction , 2007 .

[13]  Hend Suliman Al-Khalifa,et al.  AUTOMATIC READABILITY MEASUREMENTS OF THE ARABIC TEXT: AN EXPLORATORY STUDY , 2010 .

[14]  Graeme Hirst,et al.  Building Readability Lexicons with Unannotated Corpora , 2012, PITR@NAACL-HLT.

[15]  Jonathan Anderson Lix and Rix: Variations on a Little-Known Readability Index. , 1983 .

[16]  Nizar Habash,et al.  MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic , 2014, LREC.

[17]  Tim Buckwalter,et al.  A Frequency Dictionary of Arabic: Core Vocabulary for Learners , 2010 .

[18]  A. Kirkness Review: Collins COBUILD Advanced Learner's English Dictionary , 2004 .

[19]  Kemal Oflazer,et al.  The MADAR Arabic Dialect Corpus and Lexicon , 2018, LREC.

[20]  Nizar Habash,et al.  On Arabic Transliteration , 2007 .

[21]  Karim Bouzoubaa,et al.  Text readability for Arabic as a foreign language , 2015, 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA).