ALRT: Cutting Edge Tool for Automatic Generation of Arabic Lexical

A Lexical Recognition Tests (LRT) is a common tool being widely used to measure the level of language-learner’s proficiency utilizing vocabulary size (or simply the number of words acquired by a learner) for several international languages like English, Arabic, German, Chinese, and Spanish. Compared to other languages, LRT themes for Arabic are not mature enough and still they have some rooms for improvement, with very few existing proposals that mainly use human-crafted or semiautomated methods using Arabic Natural Language Processing (NLP) techniques. This paper introduces ALRT, the Arabic Lexical Recognition Tests Tool for the automatic generation of Arabic LRTs. The tool was tested using a huge dataset of Arabic vocabulary, and a subjectmatter expert intervention was involved as an extra validation step to verify the quality of generated nonwords.

[1]  Raid Zaghal,et al.  Towards the automatic generation of Arabic Lexical Recognition Tests using orthographic and phonological similarity maps , 2021, J. King Saud Univ. Comput. Inf. Sci..

[2]  Fausto Giunchiglia,et al.  A single-model approach for Arabic segmentation, POS tagging, and named entity recognition , 2018, 2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP).

[3]  Hicham Gueddah,et al.  The impact of arabic inter-character proximity and similarity on spell-checking , 2013, 2013 8th International Conference on Intelligent Systems: Theories and Applications (SITA).

[4]  Kristin Lemhöfer,et al.  Introducing LexTALE: A quick and valid Lexical Test for Advanced Learners of English , 2011, Behavior research methods.

[5]  Marc Brysbaert,et al.  Wuggy: A multilingual pseudoword generator , 2010, Behavior research methods.

[6]  Rebecca Treiman,et al.  The English Lexicon Project , 2007, Behavior research methods.

[7]  Marc Brysbaert,et al.  WordGen: A tool for word selection and nonword generation in Dutch, English, German, and French , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[8]  M. Coltheart,et al.  358,534 nonwords: The ARC Nonword Database , 2002, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[9]  Paul Meara,et al.  Scores on a yes-no vocabulary test: correction for guessing and response style , 2002 .

[10]  Fausto Giunchiglia,et al.  Towards an Optimal Solution to Lemmatization in Arabic , 2018, ACLING.

[11]  Torsten Zesch,et al.  The Role of Diacritics in Designing Lexical Recognition Tests for Arabic , 2017, ACLING.

[12]  R. Ricks The Development of Frequency-Based Assessments of Vocabulary Breadth and Depth for L2 Arabic , 2015 .

[13]  Torsten Zesch,et al.  Generating Nonwords for Vocabulary Proficiency Testing , 2015 .

[14]  Heba Elfardy,et al.  AIDA: Automatic Identification and Glossing of Dialectal Arabic , 2012, EAMT.