Some Propositions to Improve the Prediction Capability of Word Confidence Estimation for Machine Translation

—Word Confidence Estimation (WCE) is the task of predicting the correct and incorrect words in the MT output. Dealing with this problem, this paper proposes some ideas to build a binary estimator and then enhance its prediction capability. We integrate a number of features of various types (system-based, lexical, syntactic and semantic) into the conventional feature set, to build our classifier. After the experiment with all features, we deploy a " Feature Selection " strategy to filter the best performing ones. Next, we propose a method that combines multiple " weak " classifiers to build a strong " composite " classifier by taking advantage of their complementarity. Experimental results show that our propositions helped to achieve a better performance in term of F-score. Finally, we test whether WCE output can play any role in improving the sentence level confidence estimation system.

[1]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[2]  Matthew G. Snover,et al.  TERp System Description , 2008 .

[3]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation , 2007, CL.

[4]  Haizhou Li,et al.  Error Detection for Statistical Machine Translation Using Linguistic Features , 2010, ACL.

[5]  George F. Foster,et al.  Confidence estimation for translation prediction , 2003, CoNLL.

[6]  Hermann Ney,et al.  Confidence measures for statistical machine translation , 2003, MTSUMMIT.

[7]  Benjamin Lecouteux,et al.  Word Confidence Estimation for SMT N-best List Re-ranking , 2014, HaCaT@EACL.

[8]  Ngoc-Quang Luong Integrating lexical, syntactic and system-based features to improve Word Confidence Estimation in SMT , 2012, JEP-TALN-RECITAL.

[9]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[10]  Lucia Specia,et al.  Linguistic Features for Quality Estimation , 2012, WMT@NAACL-HLT.

[11]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[12]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[13]  Benjamin Lecouteux,et al.  LIG System for WMT13 QE Task: Investigating the Usefulness of Features in Word Confidence Estimation for MT , 2013, WMT@ACL.

[14]  Hervé Blanchon,et al.  The LIG Machine Translation System for WMT 2010 , 2010, WMT@ACL.

[15]  Ergun Biçici Referential Translation Machines for Quality Estimation , 2013, WMT@ACL.

[16]  Yaser Al-Onaizan,et al.  Goodness: A Method for Measuring Machine Translation Confidence , 2011, ACL.

[17]  E. Vidal,et al.  Estimation of confidence measures for machine translation , 2007, MTSUMMIT.

[18]  Radu Soricut,et al.  TrustRank: Inducing Trust in Automatic Translations via Ranking , 2010, ACL.

[19]  Hermann Ney,et al.  Generation of Word Graphs in Statistical Machine Translation , 2002, EMNLP.

[20]  Pedro M. Domingos,et al.  Naive Bayes models for probability estimation , 2005, ICML.

[21]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[22]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[23]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[24]  Kamel Smaïli,et al.  “This sentence is wrong.” Detecting errors in machine-translated sentences , 2011, Machine Translation.