论文信息 - Alignment-Based Neural Machine Translation

Alignment-Based Neural Machine Translation

Neural machine translation (NMT) has emerged recently as a promising statistical machine translation approach. In NMT, neural networks (NN) are directly used to produce translations, without relying on a pre-existing translation framework. In this work, we take a step towards bridging the gap between conventional word alignment models and NMT. We follow the hidden Markov model (HMM) approach that separates the alignment and lexical models. We propose a neural alignment model and combine it with a lexical neural model in a loglinear framework. The models are used in a standalone word-based decoder that explicitly hypothesizes alignments during search. We demonstrate that our system outperforms attention-based NMT on two tasks: IWSLT 2013 German→English and BOLT Chinese→English. We also show promising results for re-aligning the training data using neural models.

[1] Alexandre Allauzen,et al. Continuous Space Translation Models with Neural Networks , 2012, NAACL.

[2] Gholamreza Haffari,et al. Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[3] Hermann Ney,et al. rwthlm - the RWTH aachen university neural network language modeling toolkit , 2014, INTERSPEECH.

[4] Hermann Ney,et al. POS-based Word Reorderings for Statistical Machine Translation , 2006, LREC.

[5] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[6] Hermann Ney,et al. A Comparison between Count and Neural Network Models Based on Joint Translation and Reordering Sequences , 2015, EMNLP.

[7] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.

[8] Christopher D. Manning,et al. A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[9] William D. Lewis,et al. Intelligent Selection of Language Model Training Data , 2010, ACL.

[10] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.

[11] Hermann Ney,et al. Forced Derivations for Hierarchical Machine Translation , 2012, COLING.

[12] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[13] Christopher D. Manning,et al. Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[14] Holger Schwenk,et al. Continuous Space Translation Models for Phrase-Based Statistical Machine Translation , 2012, COLING.

[15] Nenghai Yu,et al. Word Alignment Modeling with Context Dependent Deep Neural Network , 2013, ACL.

[16] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[17] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[18] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[19] Yoshua Bengio,et al. Blocks and Fuel: Frameworks for deep learning , 2015, ArXiv.

[20] Hermann Ney,et al. Translation Modeling with Bidirectional Recurrent Neural Networks , 2014, EMNLP.

[21] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[22] Hermann Ney,et al. Investigations on Phrase-based Decoding with Recurrent Neural Network Language and Translation Models , 2015, WMT@EMNLP.

[23] Richard M. Schwartz,et al. Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[24] Hermann Ney,et al. Jane: Open Source Hierarchical Translation, Extended with Reordering and Lexicon Models , 2010, WMT@ACL.

[25] Hermann Ney,et al. Vector Space Models for Phrase-based Machine Translation , 2014, SSST@EMNLP.

[26] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[27] Markus Freitag,et al. Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation , 2012, COLING.

[28] Hermann Ney,et al. Training Phrase Translation Models with Leaving-One-Out , 2010, ACL.

[29] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[30] Yoshua Bengio,et al. On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[31] Jianfeng Gao,et al. Minimum Translation Modeling with Recurrent Neural Networks , 2014, EACL.

[32] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[33] Taro Watanabe,et al. Recurrent Neural Networks for Word Alignment Model , 2014, ACL.

[34] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[35] Alon Lavie,et al. Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[36] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[37] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[38] Joshua Goodman,et al. Classes for fast maximum entropy training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[39] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[40] Ashish Vaswani,et al. Decoding with Large-Scale Neural Language Models Improves Translation , 2013, EMNLP.