Limsi’s Statistical Translation Systems for WMT‘08

This paper describes our statistical machine translation systems based on the Moses toolkit for the WMT08 shared task. We address the Europarl and News conditions for the following language pairs: English with French, German and Spanish. For Europarl, n-best rescoring is performed using an enhanced n-gram or a neuronal language model; for the News condition, language models incorporate extra training data. We also report unconvincing results of experiments with factored models.

[1]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[2]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[3]  José B. Mariño,et al.  Improving statistical MT by coupling reordering and decoding , 2006, Machine Translation.

[4]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[5]  Sharon Goldwater,et al.  Improving Statistical MT through Morphological Analysis , 2005, HLT.

[6]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[7]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[8]  Andy Way,et al.  Exploiting source similarity for SMT using context-informed features , 2007, TMI.

[9]  Christoph Tillmann,et al.  A Unigram Orientation Model for Statistical Machine Translation , 2004, NAACL.

[10]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[11]  Marine Carpuat,et al.  Context-dependent phrasal translation lexicons for statistical machine translation , 2007, MTSUMMIT.

[12]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[13]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[14]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[15]  Alexandre Allauzen,et al.  Training and Evaluation of POS Taggers on the French MULTITAG Corpus , 2008, LREC.

[16]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[17]  Noah A. Smith,et al.  Rich Source-Side Context for Statistical Machine Translation , 2008, WMT@ACL.

[18]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[19]  Helmut Schmid,et al.  Estimation of Conditional Probabilities With Decision Trees and an Application to Fine-Grained POS Tagging , 2008, COLING.

[20]  Michael Gamon,et al.  Normalizing German and English inflectional morphology to improve statistical word alignment , 2004, AMTA.

[21]  Sara Stymne,et al.  German Compounds in Factored Statistical Machine Translation , 2008, GoTAL.

[22]  Sara Stymne A Comparison of Merging Strategies for Translation of German Compounds , 2009, EACL.

[23]  Philippe Langlais,et al.  Explorations in using grammatical dependencies for contextual phrase translation disambiguation , 2008, EAMT.

[24]  Francisco Casacuberta,et al.  Machine Translation with Inferred Stochastic Finite-State Transducers , 2004, Computational Linguistics.

[25]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[26]  Daniel Déchelotte Traduction automatique de la parole par méthodes statistiques , 2007 .

[27]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[28]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[29]  Philipp Koehn,et al.  Empirical Methods for Compound Splitting , 2003, EACL.