Minimum Translation Modeling with Recurrent Neural Networks

We introduce recurrent neural networkbased Minimum Translation Unit (MTU) models which make predictions based on an unbounded history of previous bilingual contexts. Traditional back-off n-gram models suffer under the sparse nature of MTUs which makes estimation of highorder sequence models challenging. We tackle the sparsity problem by modeling MTUs both as bags-of-words and as a sequence of individual source and target words. Our best results improve the output of a phrase-based statistical machine translation system trained on WMT 2012 French-English data by up to 1.5 BLEU, and we outperform the traditional n-gram based MTU approach by up to 0.8 BLEU.

[1]  Joshua Goodman,et al.  Classes for fast maximum entropy training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  Philipp Koehn,et al.  Manual and Automatic Evaluation of Machine Translation between European Languages , 2006, WMT@HLT-NAACL.

[3]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[4]  Ahmad Emami,et al.  A Neural Syntactic Language Model , 2005, Machine Learning.

[5]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[6]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[7]  Philipp Koehn,et al.  Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[8]  François Yvon,et al.  Factored bilingual n-gram language models for statistical machine translation , 2010, Machine Translation.

[9]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[10]  Mark Dredze,et al.  Revisiting the Case for Explicit Syntactic Information in Language Models , 2012, WLM@NAACL-HLT.

[11]  Nadir Durrani,et al.  Model With Minimal Translation Units, But Decode With Phrases , 2013, HLT-NAACL.

[12]  Geoffrey Zweig,et al.  Joint Language and Translation Modeling with Recurrent Neural Networks , 2013, EMNLP.

[13]  Jianfeng Gao,et al.  Learning Semantic Representations for the Phrase Translation Model , 2013, ArXiv.

[14]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[15]  Nadir Durrani,et al.  Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT? , 2013, ACL.

[16]  Geoffrey Zweig,et al.  Speed regularization and optimality in word classing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Tara N. Sainath,et al.  Deep Neural Network Language Models , 2012, WLM@NAACL-HLT.

[18]  Alexandre Allauzen,et al.  Continuous Space Translation Models with Neural Networks , 2012, NAACL.

[19]  José A. R. Fonollosa,et al.  Smooth Bilingual N-Gram Translation , 2007, EMNLP.

[20]  Lukás Burget,et al.  Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[21]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[22]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  José B. Mariño,et al.  Statistical Machine Translation of Euparl Data by using Bilingual N-grams , 2005, ParallelText@ACL.

[24]  Holger Schwenk,et al.  Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation , 2012, WLM@NAACL-HLT.

[25]  Ashish Vaswani,et al.  Decoding with Large-Scale Neural Language Models Improves Translation , 2013, EMNLP.

[26]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[27]  Vysoké Učení,et al.  Statistical Language Models Based on Neural Networks , 2012 .

[28]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[29]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[30]  Alexandre Allauzen,et al.  Limsi @ Wmt11 , 2011, WMT@EMNLP.

[31]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[32]  Hermann Ney,et al.  Comparison of feedforward and recurrent neural network language models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Jianfeng Gao,et al.  Beyond Left-to-Right: Multiple Decomposition Structures for SMT , 2013, HLT-NAACL.

[34]  Chris Quirk,et al.  Machine Translation , 1972, HLT.