An Open Source Toolkit for Word-level Confidence Estimation in Machine Translation

Recently, a growing need of Confidence Estimation (CE) for Statistical Machine Translation (SMT) systems in Computer Aided Translation (CAT), was observed. However, most of the CE toolkits are optimized for a single target language (mainly English) and, as far as we know, none of them are dedicated to this specific task and freely available. This paper presents an open-source toolkit for predicting the quality of words of a SMT output, whose novel contributions are (i) support for various target languages, (ii) handle a number of features of different types (system-based, lexical , syntactic and semantic). In addition, the toolkit also integrates a wide variety of Natural Language Processing or Machine Learning tools to pre-process data, extract features and estimate confidence at word-level. Features for Word-level Confidence Estimation (WCE) can be easily added / removed using a configuration file. We validate the toolkit by experimenting in the WCE evaluation framework of WMT shared task with two language pairs: French-English and English-Spanish. The toolkit is made available to the research community with ready-made scripts to launch full experiments on these language pairs, while achieving state-of-the-art and reproducible performances.

[1]  Hervé Blanchon,et al.  Collection of a Large Database of French-English SMT Output Corrections , 2012, LREC.

[2]  Haizhou Li,et al.  Error Detection for Statistical Machine Translation Using Linguistic Features , 2010, ACL.

[3]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[4]  Matthew G. Snover,et al.  TERp System Description , 2008 .

[5]  Lucia Specia,et al.  Multi-level Translation Quality Prediction with QuEst++ , 2015, ACL.

[6]  Joakim Nivre,et al.  Benchmarking of Statistical Dependency Parsers for French , 2010, COLING.

[7]  Christian Raymond,et al.  Boosting bonsai trees for efficient features combination: application to speaker role identification , 2014, INTERSPEECH.

[8]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Matteo Negri,et al.  FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task , 2014, WMT@ACL.

[11]  Kamel Smaïli,et al.  “This sentence is wrong.” Detecting errors in machine-translated sentences , 2011, Machine Translation.

[12]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[13]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[14]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[15]  Ergun Biçici Referential Translation Machines for Quality Estimation , 2013, WMT@ACL.

[16]  Hermann Ney,et al.  Confidence measures for statistical machine translation , 2003, MTSUMMIT.

[17]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[18]  Yaser Al-Onaizan,et al.  Goodness: A Method for Measuring Machine Translation Confidence , 2011, ACL.

[19]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[20]  Philipp Koehn,et al.  Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[21]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[22]  Lidia S. Chao,et al.  Quality Estimation for Machine Translation Using the Joint Method of Evaluation Criteria and Statistical Modeling , 2013, WMT@ACL.

[23]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[24]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[25]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[26]  Hervé Blanchon,et al.  The LIG Machine Translation System for WMT 2010 , 2010, WMT@ACL.

[27]  Benjamin Lecouteux,et al.  Towards accurate predictors of word quality for Machine Translation: Lessons learned on French-English and English-Spanish systems , 2015, Data Knowl. Eng..

[28]  Benjamin Lecouteux,et al.  LIG System for Word Level QE task at WMT14 , 2014, WMT@ACL.

[29]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[30]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[31]  Kamel Smaïli,et al.  LORIA System for the WMT15 Quality Estimation Shared Task , 2015, WMT@EMNLP.

[32]  Alexandre Allauzen,et al.  LIMSI Submission for WMT'14 QE Task , 2014, WMT@ACL.

[33]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[34]  Lucia Specia,et al.  Linguistic Features for Quality Estimation , 2012, WMT@NAACL-HLT.