Goodness: A Method for Measuring Machine Translation Confidence

State-of-the-art statistical machine translation (MT) systems have made significant progress towards producing user-acceptable translation output. However, there is still no efficient way for MT systems to inform users which words are likely translated correctly and how confident it is about the whole sentence. We propose a novel framework to predict word-level and sentence-level MT errors with a large number of novel features. Experimental results show that the MT error prediction accuracy is increased from 69.1 to 72.2 in F-score. The Pearson correlation between the proposed confidence measure and the human-targeted translation edit rate (HTER) is 0.6. Improvements between 0.4 and 0.9 TER reduction are obtained with the n-best list reranking task using the proposed confidence measure. Also, we present a visualization prototype of MT errors at the word and sentence levels with the objective to improve post-editor productivity.

[1]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[2]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[3]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[4]  Chris Quirk,et al.  Training a Sentence-Level Machine Translation Confidence Measure , 2004, LREC.

[5]  Koby Crammer,et al.  Flexible Text Segmentation with Structured Multilabel Classification , 2005, HLT.

[6]  Salim Roukos,et al.  A Maximum Entropy Word Aligner for Arabic-English Machine Translation , 2005, HLT.

[7]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[8]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[9]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[10]  Christoph Tillmann,et al.  Efficient Dynamic Programming Search Algorithms for Phrase-Based SMT , 2006 .

[11]  Ronen Feldman,et al.  A Systematic Cross-Comparison of Sequence Classifiers , 2006, SDM.

[12]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation , 2007, CL.

[13]  E. Vidal,et al.  Estimation of confidence measures for machine translation , 2007, MTSUMMIT.

[14]  Alex Waibel,et al.  The CMU TransTac 2007 Eyes-free and Hands-free Two-way Speech-to-Speech Translation System , 2007 .

[15]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[16]  Taro Watanabe,et al.  Online Large-Margin Training for Statistical Machine Translation , 2007, EMNLP.

[17]  Jinxi Xu,et al.  A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model , 2008, ACL.

[18]  Yanjun Ma,et al.  Improving Word Alignment Using Syntactic Dependencies , 2008, SSST@ACL.

[19]  Stephan Vogel,et al.  Combination of Machine Translation Systems via Hypothesis Selection from Combined N-Best Lists , 2008, AMTA 2008.

[20]  Lucia Specia,et al.  Improving the Confidence of Machine Translation Quality Estimates , 2009, MTSUMMIT.

[21]  M. Rey,et al.  11 , 001 New Features for Statistical Machine Translation , 2009 .

[22]  Fei Huang,et al.  Confidence Measure for Word Alignment , 2009, ACL.

[23]  Qin Gao,et al.  Source-side Dependency Tree Reordering Models with Subtree Movements and Constraints , 2009, MTSUMMIT.

[24]  Jun Hu,et al.  Improving Arabic-Chinese Statistical Machine Translation using English as Pivot Language , 2009, WMT@EACL.

[25]  Radu Soricut,et al.  TrustRank: Inducing Trust in Automatic Translations via Ranking , 2010, ACL.

[26]  Haizhou Li,et al.  Error Detection for Statistical Machine Translation Using Linguistic Features , 2010, ACL.