论文信息 - Continuous Space Translation Models with Neural Networks

Continuous Space Translation Models with Neural Networks

The use of conventional maximum likelihood estimates hinders the performance of existing phrase-based translation models. For lack of sufficient training data, most models only consider a small amount of context. As a partial remedy, we explore here several continuous space translation models, where translation probabilities are estimated using a continuous representation of translation units in lieu of standard discrete representations. In order to handle a large set of translation units, these representations and the associated estimates are jointly computed using a multi-layer neural network with a SOUL architecture. In small scale and large scale English to French experiments, we show that the resulting models can effectively be trained and used on top of a n-gram translation system, delivering significant improvements in performance.

[1] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[2] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[3] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[5] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[6] Hermann Ney,et al. Phrase-Based Statistical Machine Translation , 2002, KI.

[7] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[8] Jeff A. Bilmes,et al. Factored Language Models and Generalized Parallel Backoff , 2003, NAACL.

[9] Francisco Casacuberta,et al. Machine Translation with Inferred Stochastic Finite-State Transducers , 2004, Computational Linguistics.

[10] Hermann Ney,et al. The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[11] Christoph Tillmann,et al. A Unigram Orientation Model for Statistical Machine Translation , 2004, NAACL.

[12] Roland Kuhn,et al. Phrasetable Smoothing for Statistical Machine Translation , 2006, EMNLP.

[13] José B. Mariño,et al. N-gram-based Machine Translation , 2006, CL.

[14] Yee Whye Teh,et al. A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[15] José A. R. Fonollosa,et al. Smooth Bilingual N-Gram Translation , 2007, EMNLP.

[16] Holger Schwenk,et al. Continuous space language models , 2007, Comput. Speech Lang..

[17] José B. Mariño,et al. Improving statistical MT by coupling reordering and decoding , 2006, Machine Translation.

[18] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[19] Philipp Koehn,et al. Factored Translation Models , 2007, EMNLP.

[20] Brian Kingsbury,et al. Machine translation in continuous space , 2008, INTERSPEECH.

[21] Geoffrey E. Hinton,et al. A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[22] Ahmad Emami,et al. Morphological and syntactic features for Arabic speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23] Holger Schwenk,et al. N-gram-based machine translation enhanced with neural networks , 2010, IWSLT.

[24] François Yvon,et al. Factored bilingual n-gram language models for statistical machine translation , 2010, Machine Translation.

[25] Alexandre Allauzen,et al. Limsi @ Wmt11 , 2011, WMT@EMNLP.

[26] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27] Jan Niehues,et al. Wider Context by Using Bilingual Language Models in Machine Translation , 2011, WMT@EMNLP.

[28] Alexandre Allauzen,et al. Structured Output Layer neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29] Mark J. F. Gales,et al. Improving LVCSR System Combination Using Neural Network Language Model Cross Adaptation , 2011, INTERSPEECH.

[30] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[31] Alexandre Allauzen,et al. Large Vocabulary SOUL Neural Network Language Models , 2011, INTERSPEECH.

[32] Alexandre Allauzen,et al. LIMSI’s experiments in domain adaptation for IWSLT11 , 2011, IWSLT.

[33] José B. Mariño,et al. Ncode: an Open Source Bilingual N-gram SMT Toolkit , 2011, Prague Bull. Math. Linguistics.