Use of auxiliary translation for improving decoding in statistical machine translation

Recently, the concept of driven decoding (DD), has been sucessfully applied to the automatic speech recognition (speech-to-text) task: an auxiliary transcription guide the decoding process. There is a strong interest in applying this concept to statistical machine translation (SMT). This paper presents our approach on this topic. Our first attempt in driven decoding consists in adding several feature functions corresponding to the distance between the current hypothesis decoded and the auxiliary translations available. Experimental results done for a french-to-english machine translation task, in the framework of the WMT 2011 evaluation, show the potential of the DD approach proposed.

[1]  Benjamin Lecouteux,et al.  The LIG English to French machine translation system for IWSLT 2012 , 2012, IWSLT.

[2]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[3]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.

[4]  Georges Linarès,et al.  Integrating imperfect transcripts into speech recognition systems for building high-quality corpora , 2012, Comput. Speech Lang..

[5]  Stephan Vogel,et al.  Combination of Machine Translation Systems via Hypothesis Selection from Combined N-Best Lists , 2008, AMTA 2008.

[6]  Ming Zhou,et al.  Collaborative Decoding: Partial Hypothesis Re-ranking Using Translation Consensus between Decoders , 2009, ACL/IJCNLP.

[7]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[8]  Benjamin Lecouteux,et al.  LIG English-French spoken language translation system for IWSLT 2011 , 2011, IWSLT.

[9]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[10]  Giuseppe Riccardi,et al.  Computing consensus translation from multiple machine translation systems , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[11]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[12]  Georges Linarès,et al.  Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Richard M. Schwartz,et al.  Combining Outputs from Multiple Machine Translation Systems , 2007, NAACL.

[14]  Benjamin Lecouteux,et al.  The LIGA machine translation system for WMT 2011 , 2011 .

[15]  Richard M. Schwartz,et al.  Improved Word-Level System Combination for Machine Translation , 2007, ACL.

[16]  Philipp Koehn,et al.  Margin Infused Relaxed Algorithm for Moses , 2011, Prague Bull. Math. Linguistics.

[17]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[18]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[19]  Loïc Barrault,et al.  Many , 2020, Definitions.