论文信息 - Word Confidence Estimation and Its Integration in Sentence Quality Estimation for Machine Translation

Word Confidence Estimation and Its Integration in Sentence Quality Estimation for Machine Translation

This paper proposes some ideas to build an effective estimator, which predicts the quality of words in a Machine Translation (MT) output. We integrate a number of features of various types (system-based, lexical, syntactic and semantic) into the conventional feature set, for our baseline classifier training. After the experiments with all features, we deploy a “Feature Selection” strategy to filter the best performing ones. Then, a method that combines multiple “weak” classifiers to build a strong “composite” classifier by taking advantage of their complementarity allows us to achieve a better performance in term of F score. Finally, we exploit word confidence scores for improving the estimation system at sentence level.

Benjamin Lecouteux | Laurent Besacier | Ngoc-Quang Luong

[1] Hervé Blanchon,et al. The LIG Machine Translation System for WMT 2010 , 2010, WMT@ACL.

[2] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[3] George F. Foster,et al. Confidence estimation for translation prediction , 2003, CoNLL.

[4] Hermann Ney,et al. Confidence measures for statistical machine translation , 2003, MTSUMMIT.

[5] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6] Alex Kulesza,et al. Confidence Estimation for Machine Translation , 2004, COLING.

[7] Hervé Blanchon,et al. Collection of a Large Database of French-English SMT Output Corrections , 2012, LREC.

[8] Hermann Ney,et al. Word-Level Confidence Estimation for Machine Translation using Phrase-Based Translation Models , 2005, HLT.

[9] Lucia Specia,et al. Linguistic Features for Quality Estimation , 2012, WMT@NAACL-HLT.

[10] Yaser Al-Onaizan,et al. Goodness: A Method for Measuring Machine Translation Confidence , 2011, ACL.

[11] Radu Soricut,et al. TrustRank: Inducing Trust in Automatic Translations via Ranking , 2010, ACL.

[12] Hermann Ney,et al. Generation of Word Graphs in Statistical Machine Translation , 2002, EMNLP.

[13] Kamel Smaïli,et al. “This sentence is wrong.” Detecting errors in machine-translated sentences , 2011, Machine Translation.

[14] Ngoc-Quang Luong. Integrating lexical, syntactic and system-based features to improve Word Confidence Estimation in SMT , 2012, JEP-TALN-RECITAL.

[15] Haizhou Li,et al. Error Detection for Statistical Machine Translation Using Linguistic Features , 2010, ACL.

[16] François Yvon,et al. Practical Very Large Scale CRFs , 2010, ACL.

[17] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[18] Matthew G. Snover,et al. TERp System Description , 2008 .