Lexical Micro-adaptation in Statistical Machine Translation

We introduce a generic framework in Statistical Machine Translation (SMT) in which lexical hypotheses, in the form of a target language model local to the input sentence, are used to guide the search for the best translation, thus performing a lexical microadaptation. An instantiation of this framework is presented and evaluated on three language pairs, where these auxiliary hypotheses are derived through triangulation via an auxiliairy language. Our first experiments consider nine auxiliary languages, allowing us to measure their individual contribution. We then combine all their hypotheses through a decoding by consensus. Our experiments show that SMT systems can be improved by automatically produced auxiliary hypotheses. MOTS-CLÉS : traduction automatique statistique, traduction par pivot.

[1]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[2]  Philipp Koehn,et al.  (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[3]  J. Bellegarda An Overview of Statistical Language Model Adaptation , 2001 .

[4]  Hermann Ney,et al.  Statistical multi-source translation , 2001, MTSUMMIT.

[5]  Hua Wu,et al.  Pivot language approach for phrase-based statistical machine translation , 2007, ACL.

[6]  Kevin Knight,et al.  Decoding Complexity in Word-Replacement Translation Models , 1999, Comput. Linguistics.

[7]  Denyse Baillargeon,et al.  Bibliographie , 1929 .

[8]  Philipp Koehn,et al.  Further Meta-Evaluation of Machine Translation , 2008, WMT@ACL.

[9]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[10]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[11]  François Yvon,et al.  Contrastive Lexical Evaluation of Machine Translation , 2010, LREC.

[12]  José B. Mariño,et al.  System Combination for Machine Translation of Spoken and Written Language , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Hermann Ney,et al.  Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[14]  Hua Wu,et al.  Revisiting Pivot Language Approach for Machine Translation , 2009, ACL.

[15]  P. Isabelle,et al.  Phrase-based Machine Translation in a Computer-assisted Translation Environment , 2009, MTSUMMIT.

[16]  François Yvon,et al.  Plusieurs langues (bien choisies) valent mieux qu'une : traduction statistique multi-source par renforcement lexical , 2009 .

[17]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[18]  Hitoshi Isahara,et al.  A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation , 2007, NAACL.

[19]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[20]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[21]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[22]  Philipp Koehn,et al.  A parallel corpus for statistical machine translation , 2005 .

[23]  Marine Carpuat,et al.  One Translation Per Discourse , 2009, SEW@NAACL-HLT.

[24]  Marcello Federico,et al.  Domain Adaptation for Statistical Machine Translation with Monolingual Resources , 2009, WMT@EACL.

[25]  Philipp Koehn,et al.  462 Machine Translation Systems for Europe , 2009, MTSUMMIT.

[26]  Christoph Tillmann,et al.  A Unigram Orientation Model for Statistical Machine Translation , 2004, NAACL.

[27]  Marc Dymetman,et al.  Towards Interactive Text Understanding , 2003, ACL.

[28]  Stephan Vogel,et al.  Combination of Machine Translation Systems via Hypothesis Selection from Combined N-Best Lists , 2008, AMTA 2008.

[29]  Holger Schwenk,et al.  Investigations on large-scale lightly-supervised training for statistical machine translation. , 2008, IWSLT.

[30]  François Yvon,et al.  Local lexical adaptation in Machine Translation through triangulation: SMT helping SMT , 2010, COLING.

[31]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[32]  Hermann Ney,et al.  Phrase-Based Statistical Machine Translation , 2002, KI.

[33]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[34]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[35]  Hermann Ney,et al.  The RWTH System Combination System for WMT 2010 , 2010, WMT@ACL.

[36]  Tadashi Nomoto Multi-Engine Machine Translation with Voted Language Model , 2004, ACL.

[37]  Shankar Kumar,et al.  Improving Word Alignment with Bridge Languages , 2007, EMNLP.

[38]  John D. Lafferty,et al.  Cheating with imperfect transcripts , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[39]  Richard M. Schwartz,et al.  Combining Outputs from Multiple Machine Translation Systems , 2007, NAACL.

[40]  Francisco Casacuberta,et al.  Machine Translation with Inferred Stochastic Finite-State Transducers , 2004, Computational Linguistics.

[41]  Philipp Koehn,et al.  Word Lattices for Multi-Source Translation , 2009, EACL.

[42]  Alex Waibel,et al.  Document Driven Machine Translation Enhanced Automatic Speech Recognition , 2005 .