Generalized Character-Level Spelling Error Correction

We present a generalized discriminative model for spelling error correction which targets character-level transformations. While operating at the character level, the model makes use of wordlevel and contextual information. In contrast to previous work, the proposed approach learns to correct a variety of error types without guidance of manuallyselected constraints or language-specific features. We apply the model to correct errors in Egyptian Arabic dialect text, achieving 65% reduction in word error rate over the input baseline, and improving over the earlier state-of-the-art system.

[1]  ChengXiang Zhai,et al.  A generalized hidden Markov model with discriminative training for query spelling correction , 2012, SIGIR '12.

[2]  Bassam Haddad,et al.  Detection and Correction of Non-Words in Arabic: a Hybrid Approach , 2007, Int. J. Comput. Process. Orient. Lang..

[3]  Nizar Habash,et al.  Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition , 2011, ACL.

[4]  Ahmed Hassan Awadallah,et al.  Language Independent Text Correction using Finite State Automata , 2008, IJCNLP.

[5]  Timothy Baldwin,et al.  Lexical normalization for social media text , 2013, TIST.

[6]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[7]  Dan Roth,et al.  A Discriminative Model for Query Spelling Correction with Latent Structural SVM , 2012, EMNLP.

[8]  Hwee Tou Ng,et al.  A Beam-Search Decoder for Grammatical Error Correction , 2012, EMNLP.

[9]  Nizar Habash,et al.  Processing Spontaneous Orthography , 2013, NAACL.

[10]  Shourya Roy,et al.  A survey of types of text noise and techniques to handle noisy text , 2009, AND '09.

[11]  Nizar Habash,et al.  Conventional Orthography for Dialectal Arabic , 2012, LREC.

[12]  Mohamed Al-Badrashiny,et al.  Automatic Stochastic Arabic Spelling Correction With Emphasis on Space Insertions and Deletions , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Kemal Oflazer,et al.  Large Scale Arabic Error Annotation: Guidelines and Framework , 2014, LREC.

[14]  Andrew Carlson,et al.  Memory-based context-sensitive spelling correction at web scale , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[15]  Sebastian van Delden,et al.  Supervised and unsupervised automatic spelling correction algorithms , 2004, Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration, 2004. IRI 2004..

[16]  Dan Roth,et al.  A Winnow-Based Approach to Context-Sensitive Spelling Correction , 1998, Machine Learning.

[17]  Michael Gamon,et al.  Using Mostly Native Data to Correct Errors in Learners’ Writing , 2010, NAACL.

[18]  Timothy Baldwin,et al.  Lexical Normalisation of Short Text Messages: Makn Sens a #twitter , 2011, ACL.

[19]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[20]  Mohamed Ben Ahmed,et al.  Efficient Automatic Correction of Misspelled Arabic Words Based on Contextual Information , 2003, KES.

[21]  Michele Banko,et al.  Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[22]  Nizar Habash,et al.  On Arabic Transliteration , 2007 .

[23]  Günter Neumann,et al.  Arabic Computational Morphology: Knowledge-based and Empirical Methods , 2007 .

[24]  Wang Ling,et al.  Paraphrasing 4 Microblog Normalization , 2013, EMNLP.

[25]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[26]  Khaled Shaalan,et al.  An approach for analyzing and correcting spelling errors for non-native Arabic learners , 2010, 2010 The 7th International Conference on Informatics and Systems (INFOS).

[27]  Dan Roth,et al.  Algorithm Selection and Model Adaptation for ESL Correction Tasks , 2011, ACL.

[28]  Nizar Habash,et al.  Morphological Analysis and Disambiguation for Dialectal Arabic , 2013, NAACL.