Correction Annotation for Non-Native Arabic Texts: Guidelines and Corpus

We present our correction annotation guidelines to create a manually corrected nonnative (L2) Arabic corpus. We develop our approach by extending an L1 large-scale Arabic corpus and its manual corrections, to include manually corrected non-native Arabic learner essays. Our overarching goal is to use the annotated corpus to develop components for automatic detection and correction of language errors that can be used to help Standard Arabic learners (native and non-native) improve the quality of the Arabic text they produce. The created corpus of L2 text manual corrections is the largest to date. We evaluate our guidelines using inter-annotator agreement and show a high degree of consistency.

[1]  Nizar Habash,et al.  A Conventional Orthography for Tunisian Arabic , 2014, LREC.

[2]  Nizar Habash,et al.  The Columbia System in the QALB-2014 Shared Task on Arabic Error Correction , 2014, ANLP@EMNLP.

[3]  Nizar Habash,et al.  Processing Spontaneous Orthography , 2013, NAACL.

[4]  Thomas Schmidt Linguistic tool development between community practices and technology standards , 2010 .

[5]  Gintare Grigonyte,et al.  Non-Native Writers’ Errors – a Challenge to a Spell-Checker , 2014 .

[6]  Kiyotaka Uchimoto,et al.  The NICT JLE Corpus Exploiting the language learners' speech database for research and education , 2004 .

[7]  Kemal Oflazer,et al.  A Web-based Annotation Framework For Large-Scale Text Correction , 2013, IJCNLP.

[8]  Hwee Tou Ng,et al.  Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English , 2013, BEA@NAACL-HLT.

[9]  Dan Roth,et al.  Annotating ESL Errors: Challenges and Rewards , 2010 .

[10]  Sylviane Granger,et al.  Error-tagged learner corpora and CALL: a promising synergy , 2003 .

[11]  Nizar Habash,et al.  On Arabic Transliteration , 2007 .

[12]  Kemal Oflazer,et al.  CMUQ$@$QALB-2014: An SMT-based System for Automatic Arabic Error Correction , 2014, ANLP@EMNLP.

[13]  Nizar Habash,et al.  MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic , 2014, LREC.

[14]  Anna Feldman,et al.  Annotating an Arabic Learner Corpus for Error , 2008, LREC.

[15]  Helen Yannakoudakis,et al.  A New Dataset and Method for Automatically Grading ESOL Texts , 2011, ACL.

[16]  Alexandr Rosen,et al.  Error-Tagged Learner Corpus of Czech , 2010, Linguistic Annotation Workshop.

[17]  Haslina Hassan Corpus analysis of conjunctions : Arabic learners' difficulties with collocations , 2011 .

[18]  Nizar Habash,et al.  Morphological Analysis and Disambiguation for Dialectal Arabic , 2013, NAACL.

[19]  Nizar Habash,et al.  Conventional Orthography for Dialectal Arabic , 2012, LREC.

[20]  Claudia Leacock,et al.  Automated Grammatical Error Detection for Language Learners , 2010, Synthesis Lectures on Human Language Technologies.

[21]  Jianfeng Gao,et al.  Using Contextual Speller Techniques and Language Modeling for ESL Error Correction , 2008, IJCNLP.

[22]  Eric Atwell,et al.  المدونات اللغوية لمتعلمي اللغة العربية: نظامٌ لتصنيف وترميز الأخطاء اللغوية "Arabic Learner Corpora (ALC): A Taxonomy of Coding Errors" , 2012 .

[23]  Kemal Oflazer,et al.  Large Scale Arabic Error Annotation: Guidelines and Framework , 2014, LREC.

[24]  Markus Dickinson,et al.  Annotating Errors in a Hungarian Learner Corpus , 2012, LREC.

[25]  Günter Neumann,et al.  Arabic Computational Morphology: Knowledge-based and Empirical Methods , 2007 .

[26]  Nizar Habash,et al.  Building a Corpus for Palestinian Arabic: a Preliminary Study , 2014, ANLP@EMNLP.

[27]  Nizar Habash,et al.  Dialectal Arabic to English Machine Translation: Pivoting through Modern Standard Arabic , 2013, NAACL.

[28]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[29]  Nizar Habash,et al.  The First QALB Shared Task on Automatic Text Correction for Arabic , 2014, ANLP@EMNLP.

[30]  Martin Chodorow,et al.  Native Judgments of Non-Native Usage: Experiments in Preposition Error Detection , 2008, COLING 2008.

[31]  Naoki Isu,et al.  A Feedback-Augmented Method for Detecting Errors in the Writing of Learners of English , 2006, ACL.

[32]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.