Word-Level Confidence Estimation for Machine Translation

This article introduces and evaluates several different word-level confidence measures for machine translation. These measures provide a method for labeling each word in an automatically generated translation as correct or incorrect. All approaches to confidence estimation presented here are based on word posterior probabilities. Different concepts of word posterior probabilities as well as different ways of calculating them will be introduced and compared. They can be divided into two categories: System-based methods that explore knowledge provided by the translation system that generated the translations, and direct methods that are independent of the translation system. The system-based techniques make use of system output, such as word graphs or N-best lists. The word posterior probability is determined by summing the probabilities of the sentences in the translation hypothesis space that contains the target word. The direct confidence measures take other knowledge sources, such as word or phrase lexica, into account. They can be applied to output from nonstatistical machine translation systems as well. Experimental assessment of the different confidence measures on various translation tasks and in several language pairs will be presented. Moreover,the application of confidence measures for rescoring of translation hypotheses will be investigated.

[1]  Mauro Cettolo,et al.  The ITC-irst statistical machine translation system for IWSLT-2004 , 2004, IWSLT.

[2]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[3]  Alberto Barrón-Cedeño,et al.  A statistical approach to crosslingual natural language tasks , 2008, LA-NMR.

[4]  Nicola Ueffing Word confidence measures for machine translation , 2006 .

[5]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[6]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation using Phrase-Based Translation Models , 2005, HLT.

[8]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[9]  Hermann Ney,et al.  Bootstrap estimates for confidence intervals in ASR performance evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  William H. Press,et al.  Numerical recipes in C , 2002 .

[11]  Alon Lavie,et al.  Multi-engine machine translation guided by explicit word matching , 2005, EAMT.

[12]  Hermann Ney,et al.  N-Gram Posterior Probabilities for Statistical Machine Translation , 2006, WMT@HLT-NAACL.

[13]  Hermann Ney,et al.  Improvements in Phrase-Based Statistical Machine Translation , 2004, NAACL.

[14]  Hermann Ney,et al.  Application of word-level confidence measures in interactive statistical machine translation , 2005, EAMT.

[15]  Hermann Ney,et al.  An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research , 2000, LREC.

[16]  Hermann Ney,et al.  Bayes Decision Rules and Confidence Measures for Statistical Machine Translation , 2004, EsTAL.

[17]  George F. Foster,et al.  Confidence estimation for translation prediction , 2003, CoNLL.

[18]  Hermann Ney,et al.  Confidence measures for statistical machine translation , 2003, MTSUMMIT.

[19]  José Alberto Sanchis Navarro Estimación y aplicación de medidas de confianza en reconocimiento automático del habla , 2004 .

[20]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[21]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[22]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[23]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[24]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[25]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[26]  Christoph Tillmann,et al.  A Projection Extension Algorithm for Statistical Machine Translation , 2003, EMNLP.

[27]  Chris Quirk,et al.  Training a Sentence-Level Machine Translation Confidence Measure , 2004, LREC.

[28]  Hermann Ney,et al.  Word Graphs for Statistical Machine Translation , 2005, ParallelText@ACL.

[29]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[30]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[31]  Eiichiro Sumita,et al.  Using a Mixture of N-Best Lists from Multiple MT Systems in Rank-Sum-Based Confidence Measure for MT Outputs , 2004, COLING.

[32]  Alexander H. Waibel,et al.  The ISL statistical translation system for spoken language translation , 2004, IWSLT.

[33]  Gholamreza Haffari,et al.  Semi-supervised model adaptation for statistical machine translation , 2007, Machine Translation.

[34]  Hermann Ney,et al.  Generation of Word Graphs in Statistical Machine Translation , 2002, EMNLP.

[35]  H. Ney,et al.  Statistical Machine Translation of European Parliamentary Speeches , 2005, MTSUMMIT.

[36]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.