Fast and Robust Neural Network Joint Models for Statistical Machine Translation

Recent work has shown success in using neural network language models (NNLMs) as features in MT systems. Here, we present a novel formulation for a neural network joint model (NNJM), which augments the NNLM with a source context window. Our model is purely lexicalized and can be integrated into any MT decoder. We also present several variations of the NNJM which provide significant additive improvements.

[1]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[3]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[4]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[7]  Holger Schwenk,et al.  Continuous Space Language Models for Statistical Machine Translation , 2006, ACL.

[8]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[9]  Richard M. Schwartz,et al.  Language and Translation Model Adaptation using Comparable Corpora , 2008, EMNLP.

[10]  Jacob Devlin,et al.  Lexical Features for Statistical Machine Translation , 2009 .

[11]  Kevin Knight,et al.  11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[12]  Richard M. Schwartz,et al.  BBN System Description for WMT10 System Combination Task , 2010, WMT@ACL.

[13]  Jinxi Xu,et al.  String-to-Dependency Statistical Machine Translation , 2010, CL.

[14]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[15]  François Yvon,et al.  Factored bilingual n-gram language models for statistical machine translation , 2010, Machine Translation.

[16]  Daniel Marcu,et al.  Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation , 2011, EMNLP.

[17]  Alexandre Allauzen,et al.  Continuous Space Translation Models with Neural Networks , 2012, NAACL.

[18]  Holger Schwenk,et al.  Continuous Space Translation Models for Phrase-Based Statistical Machine Translation , 2012, COLING.

[19]  Spyridon Matsoukas,et al.  Trait-Based Hypothesis Selection For Machine Translation , 2012, HLT-NAACL.

[20]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[21]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[22]  Geoffrey Zweig,et al.  Joint Language and Translation Modeling with Recurrent Neural Networks , 2013, EMNLP.

[23]  Rabih Zbib,et al.  Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation , 2013, EMNLP.

[24]  Nizar Habash,et al.  Morphological Analysis and Disambiguation for Dialectal Arabic , 2013, NAACL.

[25]  Ashish Vaswani,et al.  Decoding with Large-Scale Neural Language Models Improves Translation , 2013, EMNLP.

[26]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[27]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[28]  Mikio Yamamoto,et al.  An Efficient Language Model Using Double-Array Structures , 2013, EMNLP.

[29]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.