Revisit Automatic Error Detection for Wrong and Missing Translation – A Supervised Approach

While achieving great fluency, current machine translation (MT) techniques are bottle-necked by adequacy issues. To have a closer study of these issues and accelerate model development, we propose automatic detecting adequacy errors in MT hypothesis for MT model evaluation. To do that, we annotate missing and wrong translations, the two most prevalent issues for current neural machine translation model, in 15000 Chinese-English translation pairs. We build a supervised alignment model for translation error detection (AlignDet) based on a simple Alignment Triangle strategy to set the benchmark for automatic error detection task. We also discuss the difficulties of this task and the benefits of this task for existing evaluation metrics.

[1]  Daniel Gildea,et al.  Improving the Performance of GIZA++ Using Variational Bayes , 2010 .

[2]  Mark Fishel,et al.  bleu2vec: the Painfully Familiar Metric on Continuous Vector Space Steroids , 2017, WMT.

[3]  Lucia Specia,et al.  Machine translation evaluation versus quality estimation , 2010, Machine Translation.

[4]  Preslav Nakov,et al.  Using Discourse Structure Improves Machine Translation Evaluation , 2014, ACL.

[5]  Mihael Arcan,et al.  PE2rr Corpus: Manual Error Annotation of Automatically Pre-annotated MT Post-edits , 2016, LREC.

[6]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[7]  Yang Liu,et al.  Contrastive Unsupervised Word Alignment with Non-Local Features , 2014, AAAI.

[8]  Khalil Sima'an,et al.  BEER 1.1: ILLC UvA submission to metrics and tuning task , 2015, WMT@EMNLP.

[9]  Maja Popovic,et al.  chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[10]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[11]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[12]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[13]  Wolfgang Menzel,et al.  UHH Submission to the WMT17 Metrics Shared Task , 2017, WMT.

[14]  Maja Popovic Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output , 2011, Prague Bull. Math. Linguistics.

[15]  Hermann Ney,et al.  Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[16]  Philipp Koehn,et al.  Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[17]  Lidia S. Chao,et al.  Quality Estimation for Machine Translation Using the Joint Method of Evaluation Criteria and Statistical Modeling , 2013, WMT@ACL.

[18]  Hermann Ney,et al.  Towards Automatic Error Analysis of Machine Translation Output , 2011, CL.

[19]  Mamoru Komachi,et al.  RUSE: Regressor Using Sentence Embeddings for Automatic Machine Translation Evaluation , 2018, WMT.

[20]  Jong-Hyeok Lee,et al.  Predictor-Estimator using Multilevel Task Learning with Stack Propagation for Neural Quality Estimation , 2017, WMT.

[21]  Ondrej Bojar,et al.  Results of the WMT17 Metrics Shared Task , 2017, WMT.

[22]  Ding Liu,et al.  Syntactic Features for Evaluation of Machine Translation , 2005, IEEvaluation@ACL.

[23]  Andy Way,et al.  Evaluating machine translation with LFG dependencies , 2007, Machine Translation.

[24]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[25]  Kai Fan,et al.  "Bilingual Expert" Can Find Translation Errors , 2018, AAAI.

[26]  Lucia Specia,et al.  deepQuest: A Framework for Neural-based Quality Estimation , 2018, COLING.

[27]  Bo Li,et al.  Alibaba Submission for WMT18 Quality Estimation Task , 2018, WMT.

[28]  Aljoscha Burchardt,et al.  From Human to Automatic Error Classification for Machine Translation Output , 2011, EAMT.

[29]  Maosong Sun,et al.  Punctuation as Implicit Annotations for Chinese Word Segmentation , 2009, CL.

[30]  Ondrej Bojar,et al.  Terra: a Collection of Translation Error-Annotated Corpora , 2012, LREC.

[31]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[32]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[33]  Maja Popovic,et al.  chrF++: words helping character n-grams , 2017, WMT.

[34]  Ondrej Bojar,et al.  Addicter: What Is Wrong with My Translations? , 2011, Prague Bull. Math. Linguistics.

[35]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[36]  Chris Callison-Burch,et al.  PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification , 2015, ACL.

[37]  Rudolf Rosa,et al.  CUNI Experiments for WMT17 Metrics Task , 2017, WMT.

[38]  Lucia Specia,et al.  Findings of the WMT 2018 Shared Task on Quality Estimation , 2018, WMT.

[39]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[40]  Taro Watanabe,et al.  Recurrent Neural Networks for Word Alignment Model , 2014, ACL.

[41]  Benjamin Lecouteux,et al.  LIG System for WMT13 QE Task: Investigating the Usefulness of Features in Word Confidence Estimation for MT , 2013, WMT@ACL.

[42]  François Yvon,et al.  A Corpus of Machine Translation Errors Extracted from Translation Students Exercises , 2014, LREC.

[43]  Zhaopeng Tu,et al.  Dynamic Past and Future for Neural Machine Translation , 2019, EMNLP.

[44]  Steven Bethard,et al.  Back to Basics for Monolingual Alignment: Exploiting Word Similarity and Contextual Evidence , 2014, TACL.

[45]  Hermann Ney,et al.  Accelerated DP based search for statistical translation , 1997, EUROSPEECH.

[46]  Chi-kiu Lo,et al.  MEANT 2.0: Accurate semantic MT evaluation for any output language , 2017, WMT.

[47]  Graham Neubig,et al.  Contextual Encoding for Translation Quality Estimation , 2018, WMT.

[48]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.