论文信息 - Towards accurate predictors of word quality for Machine Translation: Lessons learned on French-English and English-Spanish systems - 字舞流文

Towards accurate predictors of word quality for Machine Translation: Lessons learned on French-English and English-Spanish systems

This paper proposes some ideas to build effective estimators, which predict the quality of words in a Machine Translation (MT) output. We propose a number of novel features of various types (system-based, lexical, syntactic and semantic) and then integrate them into the conventional (previously used) feature set, for our baseline classifier training. The classifiers are built over two different bilingual corpora: French-English (fr-en) and English-Spanish (en-es). After the experiments with all features, we deploy a "Feature Selection" strategy to filter the best performing ones. Then, a method that combines multiple "weak" classifiers to constitute a strong "composite" classifier by taking advantage of their complementarity allows us to achieve a significant improvement in terms of F-score, for both fr-en and en-es systems. Finally, we exploit word confidence scores for improving the quality estimation system at sentence level.

Benjamin Lecouteux | Laurent Besacier | Ngoc-Quang Luong | L. Besacier | B. Lecouteux | N. Luong

[1] Hermann Ney,et al. Word-Level Confidence Estimation for Machine Translation using Phrase-Based Translation Models , 2005, HLT.

[2] Peter F. Patel-Schneider,et al. DLP System Description , 1998, Description Logics.

[3] Dan Klein,et al. Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[4] Lucia Specia,et al. Linguistic Features for Quality Estimation , 2012, WMT@NAACL-HLT.

[5] Haizhou Li,et al. Error Detection for Statistical Machine Translation Using Linguistic Features , 2010, ACL.

[6] Hervé Blanchon,et al. Collection of a Large Database of French-English SMT Output Corrections , 2012, LREC.

[7] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8] Philipp Koehn,et al. Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[9] Lidia S. Chao,et al. Quality Estimation for Machine Translation Using the Joint Method of Evaluation Criteria and Statistical Modeling , 2013, WMT@ACL.

[10] François Yvon,et al. Practical Very Large Scale CRFs , 2010, ACL.

[11] Benjamin Lecouteux,et al. LIG System for Word Level QE task at WMT14 , 2014, WMT@ACL.

[12] Benjamin Lecouteux,et al. LIG System for WMT13 QE Task: Investigating the Usefulness of Features in Word Confidence Estimation for MT , 2013, WMT@ACL.

[13] Chris Callison-Burch,et al. Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[14] Hermann Ney,et al. Confidence measures for statistical machine translation , 2003, MTSUMMIT.

[15] Matteo Negri,et al. FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task , 2014, WMT@ACL.

[16] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[17] Matthew G. Snover,et al. TERp System Description , 2008 .

[18] Alex Kulesza,et al. Confidence Estimation for Machine Translation , 2004, COLING.

[19] Kamel Smaïli,et al. “This sentence is wrong.” Detecting errors in machine-translated sentences , 2011, Machine Translation.

[20] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[21] Ngoc-Quang Luong. Integrating lexical, syntactic and system-based features to improve Word Confidence Estimation in SMT , 2012, JEP-TALN-RECITAL.

[22] Ergun Biçici. Referential Translation Machines for Quality Estimation , 2013, WMT@ACL.

[23] Yaser Al-Onaizan,et al. Goodness: A Method for Measuring Machine Translation Confidence , 2011, ACL.

[24] Alexandre Allauzen,et al. LIMSI Submission for WMT'14 QE Task , 2014, WMT@ACL.

[25] Kamel Smaïli,et al. LORIA System for the WMT15 Quality Estimation Shared Task , 2015, WMT@EMNLP.

[26] Radu Soricut,et al. TrustRank: Inducing Trust in Automatic Translations via Ranking , 2010, ACL.

[27] Hermann Ney,et al. Generation of Word Graphs in Statistical Machine Translation , 2002, EMNLP.