Alignment-Based Neural Machine Translation

Neural machine translation (NMT) has emerged recently as a promising statistical machine translation approach. In NMT, neural networks (NN) are directly used to produce translations, without relying on a pre-existing translation framework. In this work, we take a step towards bridging the gap between conventional word alignment models and NMT. We follow the hidden Markov model (HMM) approach that separates the alignment and lexical models. We propose a neural alignment model and combine it with a lexical neural model in a loglinear framework. The models are used in a standalone word-based decoder that explicitly hypothesizes alignments during search. We demonstrate that our system outperforms attention-based NMT on two tasks: IWSLT 2013 German→English and BOLT Chinese→English. We also show promising results for re-aligning the training data using neural models.

[1]  Alexandre Allauzen,et al.  Continuous Space Translation Models with Neural Networks , 2012, NAACL.

[2]  Gholamreza Haffari,et al.  Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[3]  Hermann Ney,et al.  rwthlm - the RWTH aachen university neural network language modeling toolkit , 2014, INTERSPEECH.

[4]  Hermann Ney,et al.  POS-based Word Reorderings for Statistical Machine Translation , 2006, LREC.

[5]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[6]  Hermann Ney,et al.  A Comparison between Count and Neural Network Models Based on Joint Translation and Reordering Sequences , 2015, EMNLP.

[7]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[8]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[9]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[10]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[11]  Hermann Ney,et al.  Forced Derivations for Hierarchical Machine Translation , 2012, COLING.

[12]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[13]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[14]  Holger Schwenk,et al.  Continuous Space Translation Models for Phrase-Based Statistical Machine Translation , 2012, COLING.

[15]  Nenghai Yu,et al.  Word Alignment Modeling with Context Dependent Deep Neural Network , 2013, ACL.

[16]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[17]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[18]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[19]  Yoshua Bengio,et al.  Blocks and Fuel: Frameworks for deep learning , 2015, ArXiv.

[20]  Hermann Ney,et al.  Translation Modeling with Bidirectional Recurrent Neural Networks , 2014, EMNLP.

[21]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[22]  Hermann Ney,et al.  Investigations on Phrase-based Decoding with Recurrent Neural Network Language and Translation Models , 2015, WMT@EMNLP.

[23]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[24]  Hermann Ney,et al.  Jane: Open Source Hierarchical Translation, Extended with Reordering and Lexicon Models , 2010, WMT@ACL.

[25]  Hermann Ney,et al.  Vector Space Models for Phrase-based Machine Translation , 2014, SSST@EMNLP.

[26]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[27]  Markus Freitag,et al.  Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation , 2012, COLING.

[28]  Hermann Ney,et al.  Training Phrase Translation Models with Leaving-One-Out , 2010, ACL.

[29]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[30]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[31]  Jianfeng Gao,et al.  Minimum Translation Modeling with Recurrent Neural Networks , 2014, EACL.

[32]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[33]  Taro Watanabe,et al.  Recurrent Neural Networks for Word Alignment Model , 2014, ACL.

[34]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[35]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[36]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[37]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[38]  Joshua Goodman,et al.  Classes for fast maximum entropy training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[39]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[40]  Ashish Vaswani,et al.  Decoding with Large-Scale Neural Language Models Improves Translation , 2013, EMNLP.