TECHLIMED@QALB-Shared Task 2015: a hybrid Arabic Error Correction System

This paper reports on the participation of Techlimed in the Second Shared Task on Automatic Arabic Error Correction organized by the Arabic Natural Language Processing Workshop. This year's competition includes two tracks, and, in addition to errors produced by native speakers (L1), also includes correction of texts written by learners of Arabic as a foreign language (L2). Techlimed participated in the L1 track. For our participation in the L1 evaluation task, we developed two systems. The first one is based on the spellchecker Hunspell with specific dictionaries. The second one is a hybrid system based on rules, morphology analysis and statistical machine translation. Our results on the test set show that the hybrid system outperforms the lexicon driven approach with a precision of 71.2%, a recall of 64.94% and an F-measure of 67.93%.

[1]  Joseph Dichy,et al.  The Architecture of a Standard Arabic Lexical Database. Some Figures, Ratios and Categories from the DIINAR.1 Source Program , 2004 .

[2]  NeyHermann,et al.  A systematic comparison of various statistical alignment models , 2003 .

[3]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[4]  Nizar Habash,et al.  The Second QALB Shared Task on Automatic Text Correction for Arabic , 2015, ANLP@ACL.

[5]  Nizar Habash,et al.  MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic , 2014, LREC.

[6]  Nizar Habash,et al.  The First QALB Shared Task on Automatic Text Correction for Arabic , 2014, ANLP@EMNLP.

[7]  Motaz Saad,et al.  OSAC: Open Source Arabic Corpora , 2010 .

[8]  Djamel Mostefa,et al.  TECHLIMED system description for the Shared Task on Automatic Arabic Error Correction , 2014, ANLP@EMNLP.

[9]  Hwee Tou Ng,et al.  Better Evaluation for Grammatical Error Correction , 2012, NAACL.

[10]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[11]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[12]  Kemal Oflazer,et al.  Correction Annotation for Non-Native Arabic Texts: Guidelines and Corpus , 2015, LAW@NAACL-HLT.

[13]  Kemal Oflazer,et al.  Large Scale Arabic Error Annotation: Guidelines and Framework , 2014, LREC.