Automatic Error Analysis for Morphologically Rich Languages

This paper presents AMEANA, an opensource tool for error analysis for natural language processing tasks targeting morphologically rich languages. Unlike standard evaluation metrics such as BLEU or WER, AMEANA automatically provides a detailed error analysis that can help researchers and developers better understand the strengths and weaknesses of their systems. AMEANA is easily adaptable to any language provided the existence of a morphological analyzer. In this paper, we focus on usability in the context of Machine Translation (MT) and demonstrate it specifically for English-to-Arabic MT.

[1]  Otakar Smrž Functional Arabic Morphology: Formal System and Implementation , 2007 .

[2]  D. R. Fulkerson,et al.  Maximal Flow Through a Network , 1956 .

[3]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[4]  Hermann Ney,et al.  Error Analysis of Verb Inflections in Spanish Translation Output , 2006 .

[5]  Hermann Ney,et al.  Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[6]  Kemal Oflazer,et al.  BLEU+: a Tool for Fine-Grained BLEU Computation , 2008, LREC.

[7]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[8]  Philipp Koehn,et al.  Agreement Constraints for Statistical Machine Translation into German , 2011, WMT@EMNLP.

[9]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[10]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[11]  Nizar Habash,et al.  Semi-automatic error analysis for large-scale statistical machine translation , 2007, MTSUMMIT.

[12]  Nizar Habash,et al.  On Arabic Transliteration , 2007 .

[13]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[14]  Kemal Oflazer,et al.  Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation , 2007, WMT@ACL.

[15]  Mary A. Flanagan,et al.  Error Classification for MT Evaluation , 1994, AMTA.

[16]  Nizar Habash,et al.  Arabic Morphological Representations for Machine Translation , 2007 .

[17]  Sara Stymne,et al.  Blast: A Tool for Error Analysis of Machine Translation Output , 2011, ACL.

[18]  Alon Lavie,et al.  METEOR-NEXT and the METEOR Paraphrase Tables: Improved Evaluation Support for Five Target Languages , 2010, WMT@ACL.

[19]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[20]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[21]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[22]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[23]  Nizar Habash,et al.  Orthographic and morphological processing for English–Arabic statistical machine translation , 2011, Machine Translation.

[24]  Nizar Habash,et al.  Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop , 2005, ACL.

[25]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[26]  Günter Neumann,et al.  Arabic Computational Morphology: Knowledge-based and Empirical Methods , 2007 .

[27]  Nizar Habash,et al.  A Corpus for Modeling Morpho-Syntactic Agreement in Arabic: Gender, Number and Rationality , 2011, ACL.

[28]  José A. R. Fonollosa,et al.  Linguistic-based Evaluation Criteria to identify Statistical Machine Translation Errors , 2010, EAMT.