Confidence estimation for machine translation using context vectors

Machine translation (MT) has been developed and has achieved wide successes over last years. But this technology is still not able to deliver high quality translation and therefore post-editing is needed. Since post-editing could be time consuming even more than the translation process, having a quality estimation of the translated parts can be very useful. It means we need to estimate the confidence of the output without having any references. Moreover, Confidence Estimation (CE) can be useful for some applications that their goal is to improve machine translation quality such as system combination, regenerating and pruning. But there is not yet any completely satisfactory method for CE task. We propose context vector-based features that are never used for CE task. We classify MT output at word level. We show that each proposed feature outperforms the baseline systems. The combination of proposed features outperforms the best baseline system 5.68% relative in CER, 3.88% relative in F-measure and 7.30% relative in negative class F-measure. Also combining proposed features with baseline features made noticeable improvement to the baseline systems.

[1]  Hermann Ney,et al.  Confidence measures for statistical machine translation , 2003, MTSUMMIT.

[2]  Kamel Smaïli,et al.  “This sentence is wrong.” Detecting errors in machine-translated sentences , 2011, Machine Translation.

[3]  T. Van de Cruys,et al.  Mining for meaning: the extraction of lexico-semantic knowledge from text , 2010 .

[4]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[5]  Christian Buck Black Box Features for the WMT 2012 Quality Estimation Shared Task , 2012, WMT@NAACL-HLT.

[6]  Stephan Vogel,et al.  Combination of Machine Translation Systems via Hypothesis Selection from Combined N-Best Lists , 2008, AMTA 2008.

[7]  Shahram Khadivi,et al.  Developing an Open-domain English-Farsi Translation System Using AFEC: Amirkabir Bilingual Farsi-English Corpus , 2012, AMTA.

[8]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[9]  Shahram Khadivi,et al.  Using Context Vectors in Improving a Machine Translation System with Bridge Language , 2013, ACL.

[10]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[11]  Haizhou Li,et al.  Error Detection for Statistical Machine Translation Using Linguistic Features , 2010, ACL.

[12]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation , 2007, CL.

[13]  Kamel Smaïli,et al.  New Confidence Measures for Statistical Machine Translation , 2009, ICAART.

[14]  Hermann Ney,et al.  Application of word-level confidence measures in interactive statistical machine translation , 2005, EAMT.

[15]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[16]  Yaser Al-Onaizan,et al.  Goodness: A Method for Measuring Machine Translation Confidence , 2011, ACL.

[17]  Eric Crestan,et al.  Web-Scale Distributional Similarity and Entity Set Expansion , 2009, EMNLP.

[18]  Kamel Smaïli,et al.  Word- and Sentence-Level Confidence Measures for Machine Translation , 2009, EAMT.