Find the errors, get the better: Enhancing machine translation via word confidence estimation

This article presents two novel ideas of improving the Machine Translation (MT) quality by applying the word-level quality prediction for the second pass of decoding. In this manner, the word scores estimated by Word Con dence Estimation (WCE) systems help to reconsider the MT hypotheses for selecting a better candidate rather than accepting the current sub-optimal one. In the rst attempt, the selection scope is limited to the MT N-best list, in which our proposed re-ranking features are combined with those of the decoder for re-scoring. Then, the search space is enlarged over the entire search graph, storing many more hypotheses generated during the rst pass of decoding. Over all paths containing words of the N-best list, we propose an algorithm to strengthen or weaken them depending on the estimated word quality. In both methods, the highest-score candidate after the search becomes the ocial translation. The results obtained show that both approaches advance the MT quality over the one-pass baseline, and the Search Graph Re-decoding achieves more gains (in BLEU score) than N-best List Re-ranking method.

[1]  Hermann Ney,et al.  Confidence measures for statistical machine translation , 2003, MTSUMMIT.

[2]  Véronique Hoste,et al.  UGENT-LT3 SCATE System for Machine Translation Quality Estimation , 2015, WMT@EMNLP.

[3]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation using Phrase-Based Translation Models , 2005, HLT.

[4]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[5]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[6]  Lucia Specia,et al.  PET: a Tool for Post-editing and Assessing Machine Translation , 2012, LREC.

[7]  Benjamin Lecouteux,et al.  LIG System for Word Level QE task at WMT14 , 2014, WMT@ACL.

[8]  Radu Soricut,et al.  TrustRank: Inducing Trust in Automatic Translations via Ranking , 2010, ACL.

[9]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[10]  Hermann Ney,et al.  Generation of Word Graphs in Statistical Machine Translation , 2002, EMNLP.

[11]  Lucia Specia,et al.  SHEF-NN: Translation Quality Estimation with Neural Networks , 2015, WMT@EMNLP.

[12]  Kevin Duh,et al.  Beyond Log-Linear Models: Boosted Minimum Error Rate Training for N-best Re-ranking , 2008, ACL.

[13]  Kamel Smaïli,et al.  “This sentence is wrong.” Detecting errors in machine-translated sentences , 2011, Machine Translation.

[14]  Benjamin Lecouteux,et al.  LIG System for WMT13 QE Task: Investigating the Usefulness of Features in Word Confidence Estimation for MT , 2013, WMT@ACL.

[15]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[16]  Matthew G. Snover,et al.  TERp System Description , 2008 .

[17]  Haizhou Li,et al.  Error Detection for Statistical Machine Translation Using Linguistic Features , 2010, ACL.

[18]  Matteo Negri,et al.  FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task , 2014, WMT@ACL.

[19]  Mei Yang,et al.  Improved Language Modeling for Statistical Machine Translation , 2005, ParallelText@ACL.

[20]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[21]  Lucia Specia,et al.  Linguistic Features for Quality Estimation , 2012, WMT@NAACL-HLT.

[22]  Taro Watanabe,et al.  Online Large-Margin Training for Statistical Machine Translation , 2007, EMNLP.

[23]  Graeme W. Blackwood Lattice rescoring methods for statistical machine translation , 2010 .

[24]  Hervé Blanchon,et al.  Collection of a Large Database of French-English SMT Output Corrections , 2012, LREC.

[25]  Stefan Riezler,et al.  QUality Estimation from ScraTCH (QUETCH): Deep Learning for Word-level Translation Quality Estimation , 2015, WMT@EMNLP.

[26]  Dongfeng Cai,et al.  Strategy-Based Technology for Estimating MT Quality , 2015, WMT@EMNLP.

[27]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[28]  Ngoc-Quang Luong Integrating lexical, syntactic and system-based features to improve Word Confidence Estimation in SMT , 2012, JEP-TALN-RECITAL.

[29]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation , 2007, CL.

[30]  Benjamin Lecouteux,et al.  Word Confidence Estimation and Its Integration in Sentence Quality Estimation for Machine Translation , 2013, KSE.

[31]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[32]  Ergun Biçici Referential Translation Machines for Quality Estimation , 2013, WMT@ACL.

[33]  Ying Zhang,et al.  Distributed Language Modeling for N-best List Re-ranking , 2006, EMNLP.

[34]  Yaser Al-Onaizan,et al.  Goodness: A Method for Measuring Machine Translation Confidence , 2011, ACL.

[35]  Preslav Nakov,et al.  Optimizing for Sentence-Level BLEU+1 Yields Short Translations , 2012, COLING.

[36]  Alexandre Allauzen,et al.  LIMSI Submission for WMT'14 QE Task , 2014, WMT@ACL.

[37]  Lucia Specia,et al.  Data enhancement and selection strategies for the word-level Quality Estimation , 2015, WMT@EMNLP.

[38]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[39]  Lidia S. Chao,et al.  Quality Estimation for Machine Translation Using the Joint Method of Evaluation Criteria and Statistical Modeling , 2013, WMT@ACL.

[40]  François Yvon,et al.  Computing Lattice BLEU Oracle Scores for Machine Translation , 2012, EACL.

[41]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[42]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.