Error Correction for Arabic Dictionary Lookup

We describe a new Arabic spelling correction system which is intended for use with electronic dictionary search by learners of Arabic. Unlike other spelling correction systems, this system does not depend on a corpus of attested student errors but on student- and teacher-generated ratings of confusable pairs of phonemes or letters. Separate error modules for keyboard mistypings, phonetic confusions, and dialectal confusions are combined to create a weighted finite-state transducer that calculates the likelihood that an input string could correspond to each citation form in a dictionary of Iraqi Arabic. Results are ranked by the estimated likelihood that a citation form could be misheard, mistyped, or mistranscribed for the input given by the user. To evaluate the system, we developed a noisy-channel model trained on students’ speech errors and use it to perturb citation forms from a dictionary. We compare our system to a baseline based on Levenshtein distance and find that, when evaluated on single-error queries, our system performs 28% better than the baseline (overall MRR) and is twice as good at returning the correct dictionary form as the top-ranked result. We believe this to be the first spelling correction system designed for a spoken, colloquial dialect of Arabic.

[1]  Kenneth Ward Church,et al.  Probability scoring for spelling correction , 1991 .

[2]  T. Okada A Corpus Analysis of Spelling Errors Made by Japanese EFL Writers , 2004 .

[3]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[4]  J. Owens A Linguistic History of Arabic , 2006 .

[5]  Roger Mitton,et al.  The adaptation of an English spellchecker for Japanese writers , 2007 .

[6]  Adriane Boyd,et al.  Pronunciation Modeling in Spelling Correction for Writers of English as a Foreign Language , 2009, NAACL.

[7]  John Nerbonne,et al.  Measuring Dialect Distance Phonetically , 1997, SIGMORPHON@EACL.

[8]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[9]  Niloofar Haeri,et al.  Sacred Language, Ordinary People: Dilemmas of Culture and Politics in Egypt , 2003 .

[10]  Eric Brill,et al.  An Improved Error Model for Noisy Channel Spelling Correction , 2000, ACL.

[11]  Shrikanth S. Narayanan,et al.  Modeling and automating detection of errors in Arabic language learner speech , 2005, INTERSPEECH.

[12]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[13]  Mervat Ibrahim The Arabic Language , 2012 .

[14]  D. R. Woodhead,et al.  A Dictionary of Iraqi Arabic : Arabic-English , 1991 .

[15]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[16]  Nizar Habash REMOOV : A Tool for Online Handling of Out-of-Vocabulary Words in Machine Translation , 2009 .

[17]  Anna Feldman,et al.  Annotating an Arabic Learner Corpus for Error , 2008, LREC.