论文信息 - New Confidence Measures for Statistical Machine Translation

New Confidence Measures for Statistical Machine Translation

A confidence measure is able to estimate the reliability of an hypothesis provided by a machine translation system. The problem of confidence measure can be seen as a process of testing : we want to decide whether the most probable sequence of words provided by the machine translation system is correct or not. In the following we describe several original word-level confidence measures for machine translation, based on mutual information, n-gram language model and lexical features language model. We evaluate how well they perform individually or together, and show that using a combination of confidence measures based on mutual information yields a classification error rate as low as 25.1% with an F-measure of 0.708.

[1] Robert C. Moore. Association-Based Bilingual Word Alignment , 2005, ParallelText@ACL.

[2] Hermann Ney,et al. Word-Level Confidence Estimation for Machine Translation using Phrase-Based Translation Models , 2005, HLT.

[3] George F. Foster,et al. Confidence estimation for NLP applications , 2006, TSLP.

[4] Philip Resnik,et al. Proceedings of the ACL Workshop on Building and Using Parallel Texts , 2005 .

[5] Kamel Smaïli,et al. Statistical feature language model , 2004, INTERSPEECH.

[6] Kamel Smaïli,et al. Using inter-lingual triggers for machine translation , 2007, INTERSPEECH.

[7] Hermann Ney,et al. Bayes Decision Rules and Confidence Measures for Statistical Machine Translation , 2004, EsTAL.

[8] Joseph Razik. Mesures de confiance trame-synchrones et locales en reconnaissance automatique de la parole , 2007 .

[9] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[10] Patrick Wambacq,et al. Confidence scoring based on backward language models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11] Guy Perennou,et al. BDLEX: a lexicon for spoken and written french , 1998, LREC.

[12] Eiichiro Sumita,et al. Using a Mixture of N-Best Lists from Multiple MT Systems in Rank-Sum-Based Confidence Measure for MT Outputs , 2004, COLING.

[13] Andrew McCallum,et al. Confidence Estimation for Information Extraction , 2004, NAACL.

[14] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[15] Julie Mauclair. Mesures de confiance en traitement automatique de la parole et applications , 2006 .

[16] Alex Kulesza,et al. Confidence Estimation for Machine Translation , 2004, COLING.

[17] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[18] C. Uhrik,et al. Confidence metrics based on n-gram language model backoff behaviors , 1997, EUROSPEECH.

[19] Chris Callison-Burch,et al. Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[20] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[21] Ren-Hua Wang,et al. A comparative study on various confidence measures in large vocabulary speech recognition , 2004, 2004 International Symposium on Chinese Spoken Language Processing.