Combining Techniques from different NN-based Language Models for Machine Translation

This paper presents two improvements of language models based on Restricted Boltzmann Machine (RBM) for large machine translation tasks. In contrast to other continuous space approach, RBM based models can easily be integrated into the decoder and are able to directly learn a hidden representation of the n-gram. Previous work on RBM-based language models do not use a shared word representation and therefore, they might suffer of a lack of generalization for larger contexts. Moreover, since the training step is very time consuming, they are only used for quite small copora. In this work we add a shared word representation for the RBMbased language model by factorizing the weight matrix. In addition, we propose an efficient and tailored sampling algorithm that allows us to drastically speed up the training process. Experiments are carried out on two German to English translation tasks and the results show that the training time could be reduced by a factor of 10 without any drop in performance. Furthermore, the RBM-based model can also be trained on large size corpora.

[1]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[2]  Alexandre Allauzen,et al.  Structured Output Layer neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Jan Niehues,et al.  A POS-Based Model for Long-Range Reorderings in SMT , 2009, WMT@EACL.

[4]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[5]  Jan Niehues,et al.  The Karlsruhe Institute of Technology Translation Systems for the WMT 2013 , 2012, WMT@NAACL-HLT.

[6]  Ashish Vaswani,et al.  Decoding with Large-Scale Neural Language Models Improves Translation , 2013, EMNLP.

[7]  Alexandre Allauzen,et al.  Continuous Space Translation Models with Neural Networks , 2012, NAACL.

[8]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[9]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[10]  Kiyohiro Shikano,et al.  Neural Network Approach to Word Category Prediction for English Texts , 1990, COLING.

[11]  Jan Niehues,et al.  Continuous space language models using restricted Boltzmann machines , 2012, IWSLT.

[12]  Jean-Luc Gauvain,et al.  Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Alexandre Allauzen,et al.  Training Continuous Space Language Models: Some Practical Issues , 2010, EMNLP.

[14]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[15]  A. Waibel,et al.  Detailed Analysis of Different Strategies for Phrase Table Adaptation in SMT , 2012, AMTA.

[16]  Ryan P. Adams,et al.  Training Restricted Boltzmann Machines on Word Observations , 2012, ICML.

[17]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[18]  José A. R. Fonollosa,et al.  Smooth Bilingual N-Gram Translation , 2007, EMNLP.

[19]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[20]  Jan Niehues,et al.  Discriminative Word Alignment via Alignment Matrix Modeling , 2008, WMT@ACL.

[21]  Geoffrey E. Hinton,et al.  Three new graphical models for statistical language modelling , 2007, ICML '07.

[22]  Jan Niehues,et al.  Wider Context by Using Bilingual Language Models in Machine Translation , 2011, WMT@EMNLP.

[23]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[24]  Sadaoki Furui,et al.  International Speech Communication Association , 2006 .

[25]  Ashish Venugopal Training and Evaluating Error Minimization Rules for Statistical Machine Translation , 2005 .