ArCADE: An Arabic Corpus of Auditory Dictation Errors

We present a new corpus of word-level listening errors collected from 62 native English speakers learning Arabic designed to inform models of spell checking for this learner population. While we use the corpus to assist in automated detection and correction of auditory errors in electronic dictionary lookup, the corpus can also be used as a phonological error layer, to be combined with a composition error layer in a more complex spell-checking system for non-native speakers. The corpus may be useful to instructors of Arabic as a second language, and researchers who study second language phonology and listening perception.

[1]  Dj Hovermale,et al.  Erron: A Phrase-Based Machine Translation Approach to Customized Spelling Correction , 2011 .

[2]  Venu Govindaraju,et al.  A Bayesian Framework for Modeling Accents in Handwriting , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[3]  Anna Feldman,et al.  ARIDA: An Arabic Interlanguage Database and Its Applications: A Pilot Study , 2008, FLAIRS Conference.

[4]  Roger Mitton,et al.  The adaptation of an English spellchecker for Japanese writers , 2007 .

[5]  Laura Rose Faircloth The L2 Perception of Phonemic Distinctions in Arabic by English Speakers , 2013 .

[6]  Eric Atwell,et al.  Potential Uses of the Arabic Learner Corpus , 2013 .

[7]  Venu Govindaraju,et al.  On the Accent in Handwriting of Individuals , 2006 .

[8]  A. Bradlow 10. Training non-native language sound patterns: Lessons from training Japanese adults on the English /®/ - /l/ contrast , 2008 .

[9]  Mans Hulden,et al.  Foma: a Finite-State Compiler and Library , 2009, EACL.

[10]  John Field Listening in the Language Classroom , 2009 .

[11]  Shrikanth S. Narayanan,et al.  Modeling and automating detection of errors in Arabic language learner speech , 2005, INTERSPEECH.

[12]  P. Prince,et al.  Writing It Down: Issues Relating to the Use of Restitution Tasks in Listening Comprehension , 2012 .

[13]  Khaled Yahya Huthaily,et al.  Second Language Instruction with Phonological Knowledge: Teaching Arabic to Speakers of English , 2008 .

[14]  Nadja Nesselhauf,et al.  Learner Corpora and their Potential for Language Teaching , 2004 .

[15]  Sylviane Granger,et al.  Categorising spelling errors to assess L2 writing , 2011 .

[16]  Mahmoud Al-Batal,et al.  Alif Baa: Introduction to Arabic Letters and Sounds , 1995 .

[17]  Chris Taylor,et al.  Error Correction for Arabic Dictionary Lookup , 2010, LREC.

[18]  Anna Feldman,et al.  Annotating an Arabic Learner Corpus for Error , 2008, LREC.

[19]  Mark Warschauer,et al.  Automated writing evaluation: defining the classroom research agenda , 2006 .

[20]  Edward W. D. Whittaker,et al.  Creating a manually error-tagged and shallow-parsed learner corpus , 2011, ACL.

[21]  Irccyn,et al.  Tenth international workshop on frontiers in handwriting recognition , 2006 .