论文信息 - Joint Language and Translation Modeling with Recurrent Neural Networks

Joint Language and Translation Modeling with Recurrent Neural Networks

We present a joint language and translation model based on a recurrent neural network which predicts target words based on an unbounded history of both source and target words. The weaker independence assumptions of this model result in a vastly larger search space compared to related feedforward-based language or translation models. We tackle this issue with a new lattice rescoring algorithm and demonstrate its effectiveness empirically. Our joint model builds on a well known recurrent neural network language model (Mikolov, 2012) augmented by a layer of additional inputs from the source language. We show competitive accuracy compared to the traditional channel model features. Our best results improve the output of a system trained on WMT 2012 French-English data by up to 1.5 BLEU, and by 1.1 BLEU on average across several test sets.

[1] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[2] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[3] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[4] D. Signorini,et al. Neural networks , 1995, The Lancet.

[5] Joshua Goodman,et al. Classes for fast maximum entropy training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[7] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.

[8] Philipp Koehn,et al. Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[9] Ahmad Emami,et al. A Neural Syntactic Language Model , 2005, Machine Learning.

[10] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[11] Wolfgang Macherey,et al. Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[12] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[13] François Yvon,et al. Factored bilingual n-gram language models for statistical machine translation , 2010, Machine Translation.

[14] Alexandre Allauzen,et al. Limsi @ Wmt11 , 2011, WMT@EMNLP.

[15] Kenneth Ward Church,et al. A Fast Re-scoring Strategy to Capture Long-Distance Dependencies , 2011, EMNLP.

[16] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17] Lukás Burget,et al. Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[18] Sanjeev Khudanpur,et al. Variational approximation of long-span language models for lvcsr , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19] Alexandre Allauzen,et al. Continuous Space Translation Models with Neural Networks , 2012, NAACL.

[20] Alexandre Allauzen,et al. Limsi @ Wmt12 , 2012, WMT@NAACL-HLT.

[21] Vysoké Učení,et al. Statistical Language Models Based on Neural Networks , 2012 .

[22] Tara N. Sainath,et al. Deep Neural Network Language Models , 2012, WLM@NAACL-HLT.

[23] Geoffrey Zweig,et al. Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[24] Holger Schwenk,et al. Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation , 2012, WLM@NAACL-HLT.

[25] Geoffrey Zweig,et al. Speed regularization and optimality in word classing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[27] Geoffrey Zweig,et al. Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[28] Jianfeng Gao,et al. Learning Semantic Representations for the Phrase Translation Model , 2013, ArXiv.

[29] Hermann Ney,et al. Comparison of feedforward and recurrent neural network language models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.