Continuous Space Language Models for Statistical Machine Translation

Statistical machine translation systems are based on one or more translation models and a language model of the target language. While many different translation models and phrase extraction algorithms have been proposed, a standard word n-gram back-off language model is used in most systems. In this work, we propose to use a new statistical language model that is based on a continuous representation of the words in the vocabulary. A neural network is used to perform the projection and the probability estimation. We consider the translation of European Parliament Speeches. This task is part of an international evaluation organized by the TC-STAR project in 2006. The proposed method achieves consistent improvements in the BLEU score on the development and test data. We also present algorithms to improve the estimation of the language model probabilities when splitting long sentences into shorter chunks.

[1]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[2]  PietraVincent J. Della,et al.  The mathematics of statistical machine translation , 1993 .

[3]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[4]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[5]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .

[6]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[7]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[8]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[9]  Peng Xu,et al.  Random Forests in Language Modelin , 2004, EMNLP.

[10]  Alexander M. Fraser,et al.  A Smorgasbord of Features for Statistical Machine Translation , 2004, NAACL.

[11]  H. Schwenk,et al.  Efficient training of large neural networks for language modeling , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[12]  Hermann Ney,et al.  Cross domain automatic transcription on the TC-STAR EPPS corpus , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[13]  José B. Mariño,et al.  Bilingual N-gram Statistical Machine Translation , 2005 .

[14]  Mei Yang,et al.  Improved Language Modeling for Statistical Machine Translation , 2005, ParallelText@ACL.

[15]  Frank Vanden Berghen,et al.  CONDOR, a new parallel, constrained extension of Powell's UOBYQA algorithm: experimental results and comparison with the DFO algorithm , 2005 .

[16]  Jean-Luc Gauvain,et al.  Training Neural Network Language Models on Very Large Corpora , 2005, HLT.

[17]  Ahmad Emami,et al.  Random clusterings for language modeling , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[18]  Marta R. Costa-jussà,et al.  Continuous space language models for the IWSLT 2006 task , 2006, IWSLT.

[19]  Hermann Ney,et al.  Reranking Translation Hypotheses Using Structural Properties , 2006, Learning Structured Information@EACL.

[20]  Jean-Luc Gauvain,et al.  The 2006 LIMSI Statistical Machine Translation System for TC-STAR , 2006 .

[21]  José A. R. Fonollosa,et al.  Smooth Bilingual N-Gram Translation , 2007, EMNLP.

[22]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[23]  Miles Osborne,et al.  Randomised Language Modelling for Statistical Machine Translation , 2007, ACL.

[24]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[25]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[26]  Mauro Cettolo,et al.  Efficient Handling of N-gram Language Models for Statistical Machine Translation , 2007, WMT@ACL.

[27]  Holger Schwenk,et al.  Data selection and smoothing in an open-source system for the 2008 NIST machine translation evaluation , 2008, INTERSPEECH.