Towards accurate predictors of word quality for Machine Translation: Lessons learned on French-English and English-Spanish systems

This paper proposes some ideas to build effective estimators, which predict the quality of words in a Machine Translation (MT) output. We propose a number of novel features of various types (system-based, lexical, syntactic and semantic) and then integrate them into the conventional (previously used) feature set, for our baseline classifier training. The classifiers are built over two different bilingual corpora: French-English (fr-en) and English-Spanish (en-es). After the experiments with all features, we deploy a "Feature Selection" strategy to filter the best performing ones. Then, a method that combines multiple "weak" classifiers to constitute a strong "composite" classifier by taking advantage of their complementarity allows us to achieve a significant improvement in terms of F-score, for both fr-en and en-es systems. Finally, we exploit word confidence scores for improving the quality estimation system at sentence level.

[1]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation using Phrase-Based Translation Models , 2005, HLT.

[2]  Peter F. Patel-Schneider,et al.  DLP System Description , 1998, Description Logics.

[3]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[4]  Lucia Specia,et al.  Linguistic Features for Quality Estimation , 2012, WMT@NAACL-HLT.

[5]  Haizhou Li,et al.  Error Detection for Statistical Machine Translation Using Linguistic Features , 2010, ACL.

[6]  Hervé Blanchon,et al.  Collection of a Large Database of French-English SMT Output Corrections , 2012, LREC.

[7]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[9]  Lidia S. Chao,et al.  Quality Estimation for Machine Translation Using the Joint Method of Evaluation Criteria and Statistical Modeling , 2013, WMT@ACL.

[10]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[11]  Benjamin Lecouteux,et al.  LIG System for Word Level QE task at WMT14 , 2014, WMT@ACL.

[12]  Benjamin Lecouteux,et al.  LIG System for WMT13 QE Task: Investigating the Usefulness of Features in Word Confidence Estimation for MT , 2013, WMT@ACL.

[13]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[14]  Hermann Ney,et al.  Confidence measures for statistical machine translation , 2003, MTSUMMIT.

[15]  Matteo Negri,et al.  FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task , 2014, WMT@ACL.

[16]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[17]  Matthew G. Snover,et al.  TERp System Description , 2008 .

[18]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[19]  Kamel Smaïli,et al.  “This sentence is wrong.” Detecting errors in machine-translated sentences , 2011, Machine Translation.

[20]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[21]  Ngoc-Quang Luong Integrating lexical, syntactic and system-based features to improve Word Confidence Estimation in SMT , 2012, JEP-TALN-RECITAL.

[22]  Ergun Biçici Referential Translation Machines for Quality Estimation , 2013, WMT@ACL.

[23]  Yaser Al-Onaizan,et al.  Goodness: A Method for Measuring Machine Translation Confidence , 2011, ACL.

[24]  Alexandre Allauzen,et al.  LIMSI Submission for WMT'14 QE Task , 2014, WMT@ACL.

[25]  Kamel Smaïli,et al.  LORIA System for the WMT15 Quality Estimation Shared Task , 2015, WMT@EMNLP.

[26]  Radu Soricut,et al.  TrustRank: Inducing Trust in Automatic Translations via Ranking , 2010, ACL.

[27]  Hermann Ney,et al.  Generation of Word Graphs in Statistical Machine Translation , 2002, EMNLP.