Aralex: A lexical database for Modern Standard Arabic

In this article, we present a new lexical database for Modern Standard Arabic: Aralex. Based on a contemporary text corpus of 40 million words, Aralex provides information about (1) the token frequencies of roots and word patterns, (2) the type frequency, or family size, of roots and word patterns, and (3) the frequency of bigrams, trigrams in orthographic forms, roots, and word patterns. Aralex will be a useful tool for studying the cognitive processing of Arabic through the selection of stimuli on the basis of precise frequency counts. Researchers can use it as a source of information on natural language processing, and it may serve an educational purpose by providing basic vocabulary lists. Aralex is distributed under a GNU-like license, allowing people to interrogate it freely online or to download it from www.mrc-cbu.cam.ac.uk:8081/aralex .online/login.jsp.

[1]  Bernard Lété,et al.  MANULEX: A grade-level lexical database from French elementary school readers , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[2]  A. Idrissi,et al.  On the Mental Representation of Arabic Roots , 2008, Linguistic Inquiry.

[3]  Mervat Ibrahim The Arabic Language , 2012 .

[4]  M. Gaskell,et al.  A re-examination of the default system for Arabic plurals , 2002 .

[5]  Manuel Carreiras,et al.  E-Hitz: A word frequency list and a program for deriving psycholinguistic statistics in an agglutinative language (Basque) , 2006, Behavior research methods.

[6]  John J. McCarthy,et al.  A prosodic theory of nonconcatenative morphology , 1981 .

[7]  Friedemann Pulvermüller,et al.  Arabic Morphology in the Neural Language System , 2010, Journal of Cognitive Neuroscience.

[8]  William D. Marslen-Wilson,et al.  Discontinuous morphology in time: Incremental masked priming in Arabic , 2005 .

[9]  Clive Holes,et al.  Modern Arabic: Structures, Functions, and Varieties , 1996 .

[10]  H. Stadthagen-González,et al.  The Bristol norms for age of acquisition, imageability, and familiarity , 2006, Behavior research methods.

[11]  Nicola J. Pitchford,et al.  GreekLex: A lexical database of Modern Greek , 2008, Behavior research methods.

[12]  C. Davis,et al.  BuscaPalabras: A program for deriving orthographic and phonological neighborhood statistics and other psycholinguistic indices in Spanish , 2005, Behavior research methods.

[13]  A. Idrissi,et al.  The Mental Representation of Semitic Words , 2000, Linguistic Inquiry.

[14]  References , 1971 .

[15]  Marc Brysbaert,et al.  Lexique 2 : A new French lexical database , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[16]  P. Mousty,et al.  Brulex: une base de donne 'es lexicales informatise 'e pour le franc?ais e 'crit et parle , 1990 .

[17]  Kim Plunkett,et al.  A Connectionist Model of the Arabic Plural System , 1997 .

[18]  C. Davis N-Watch: A program for deriving neighborhood size and other psycholinguistic statistics , 2005, Behavior research methods.

[19]  K. Forster,et al.  What can we learn from the morphology of Hebrew? A masked-priming investigation of morphological representation. , 1997, Journal of experimental psychology. Learning, memory, and cognition.