From Feature To Paradigm: Deep Learning In Machine Translation

In the last years, deep learning algorithms have highly revolutionized several areas including speech, image and natural language processing. The specific field of Machine Translation (MT) has not remained invariant. Integration of deep learning in MT varies from re-modeling existing features into standard statistical systems to the development of a new architecture. Among the different neural networks, research works use feedforward neural networks, recurrent neural networks and the encoder-decoder schema. These architectures are able to tackle challenges as having low-resources or morphology variations. This manuscript focuses on describing how these neural networks have been integrated to enhance different aspects and models from statistical MT, including language modeling, word alignment, translation, reordering, and rescoring. Then, we report the new neural MT approach together with a description of the foundational related works and recent approaches on using subword, characters and training with multilingual languages, among others. Finally, we include an analysis of the corresponding challenges and future work in using deep learning in MT.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Holger Schwenk,et al.  N-gram-based machine translation enhanced with neural networks , 2010, IWSLT.

[3]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[4]  Bill Byrne,et al.  The Edit Distance Transducer in Action: The University of Cambridge English-German System at WMT16 , 2016, WMT.

[5]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[6]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[7]  Jianfeng Gao,et al.  Minimum Translation Modeling with Recurrent Neural Networks , 2014, EACL.

[8]  Bill Byrne,et al.  Syntactically Guided Neural Machine Translation , 2016, ACL.

[9]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[10]  Hermann Ney,et al.  A Comparison between Count and Neural Network Models Based on Joint Translation and Reordering Sequences , 2015, EMNLP.

[11]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[12]  Hermann Ney,et al.  Translation Modeling with Bidirectional Recurrent Neural Networks , 2014, EMNLP.

[13]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[14]  José A. R. Fonollosa,et al.  Character-based Neural Machine Translation , 2016, ACL.

[15]  Bo Wang,et al.  SYSTRAN's Pure Neural Machine Translation Systems , 2016, ArXiv.

[16]  Taro Watanabe,et al.  Recurrent Neural Networks for Word Alignment Model , 2014, ACL.

[17]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[18]  Marta R. Costa-jussà,et al.  Byte-based Neural Machine Translation , 2017, SWCN@EMNLP.

[19]  Maosong Sun,et al.  A Neural Reordering Model for Phrase-based Translation , 2014, COLING.

[20]  Hai Zhao,et al.  Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation , 2013, EMNLP.

[21]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[22]  Taro Watanabe,et al.  Recurrent Neural Network-based Tuple Sequence Model for Machine Translation , 2014, COLING.

[23]  Marta R. Costa-jussà,et al.  Continuous space language models for the IWSLT 2006 task , 2006, IWSLT.

[24]  Jason Lee,et al.  Fully Character-Level Neural Machine Translation without Explicit Segmentation , 2016, TACL.

[25]  Qun Liu,et al.  A novel dependency-to-string model for statistical machine translation , 2011, EMNLP.

[26]  Shijin Wang,et al.  LSTM Neural Reordering Feature for Statistical Machine Translation , 2015, NAACL.

[27]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[28]  Adam Lopez,et al.  Statistical machine translation , 2008, AMTA.

[29]  Marta R. Costa-jussà,et al.  The TALP-UPC Spanish-English WMT Biomedical Task: Bilingual Embeddings and Char-based Neural Language Model Rescoring in a Phrase-based System , 2016, WMT.

[30]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[31]  Kyunghyun Cho,et al.  Natural Language Understanding with Distributed Representation , 2015, ArXiv.

[32]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[33]  Jianfeng Gao,et al.  Decoder Integration and Expected BLEU Training for Recurrent Neural Network Language Models , 2014, ACL.

[34]  Arianna Bisazza,et al.  Surveys: A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena , 2015, CL.

[35]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[36]  Yoshua Bengio,et al.  Multi-way, multilingual neural machine translation , 2017, Comput. Speech Lang..

[37]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[38]  Christopher D. Manning,et al.  Deep Neural Language Models for Machine Translation , 2015, CoNLL.

[39]  Dekai Wu Trainable Coarse Bilingual Grammars for Parallel Text Bracketing , 1995, VLC@ACL.

[40]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[41]  Xuan Zhu,et al.  Recurrent Neural Network based Rule Sequence Model for Statistical Machine Translation , 2015, ACL.

[42]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[43]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[44]  Ashish Vaswani,et al.  Decoding with Large-Scale Neural Language Models Improves Translation , 2013, EMNLP.

[45]  Navdeep Jaitly,et al.  Sequence-to-Sequence Models Can Directly Translate Foreign Speech , 2017, INTERSPEECH.

[46]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[47]  Marta R. Costa-jussà,et al.  Statistical Machine Reordering , 2006, EMNLP.

[48]  Yoshua Bengio,et al.  Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[49]  Marta R. Costa-jussà How much hybridization does machine translation Need? , 2015, J. Assoc. Inf. Sci. Technol..

[50]  Marta R. Costa-jussà,et al.  English-to-Hindi system description for WMT 2014: Deep Source-Context Features for Moses , 2014, WMT@ACL.

[51]  Yoshua Bengio,et al.  Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation , 2014, SSST@EMNLP.

[52]  Giuseppe Attardi,et al.  Non-projective Dependency-based Pre-Reordering with Recurrent Neural Network for Machine Translation , 2015, ACL.

[53]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[54]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[55]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[56]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[57]  Yoshimasa Tsuruoka,et al.  Tree-to-Sequence Attentional Neural Machine Translation , 2016, ACL.

[58]  Jianfeng Gao,et al.  Learning Continuous Phrase Representations for Translation Modeling , 2014, ACL.

[59]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[60]  Kevin Knight,et al.  Multi-Source Neural Translation , 2016, NAACL.

[61]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[62]  Katsuhito Sudoh,et al.  Neural Reordering Model Considering Phrase Translation and Word Alignment for Phrase-based Translation , 2016, WAT@COLING.

[63]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[64]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[65]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[66]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[67]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[68]  Jiajun Zhang,et al.  Deep Neural Networks in Machine Translation: An Overview , 2015, IEEE Intelligent Systems.

[69]  José A. R. Fonollosa,et al.  Smooth Bilingual N-Gram Translation , 2007, EMNLP.

[70]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[71]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[72]  Qun Liu,et al.  Encoding Source Language with Convolutional Neural Network for Machine Translation , 2015, ACL.

[73]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[74]  Desmond Elliott,et al.  Multilingual Image Description with Neural Sequence Models , 2015, 1510.04709.

[75]  Marta R. Costa-jussà,et al.  Why Catalan-Spanish Neural Machine Translation? Analysis, comparison and combination with standard Rule and Phrase-based technologies , 2017, VarDial.

[76]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[77]  Lukasz Kaiser,et al.  One Model To Learn Them All , 2017, ArXiv.

[78]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[79]  Yoshua Bengio,et al.  Gated Feedback Recurrent Neural Networks , 2015, ICML.

[80]  Richard M. Schwartz,et al.  Statistical Machine Translation Features with Multitask Tensor Networks , 2015, ACL.

[81]  Alexander J. Smola,et al.  Neural Machine Translation with Recurrent Attention Modeling , 2016, EACL.

[82]  Yoshua Bengio,et al.  Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model , 2008, IEEE Transactions on Neural Networks.

[83]  Jan Niehues,et al.  Continuous space language models using restricted Boltzmann machines , 2012, IWSLT.

[84]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[85]  Marta R. Costa-jussà,et al.  Chinese–Spanish neural machine translation enhanced with character and word bitmap fonts , 2017, Machine Translation.

[86]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[87]  Holger Schwenk,et al.  Continuous-Space Language Models for Statistical Machine Translation , 2010 .

[88]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[89]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[90]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[91]  Alexandre Allauzen,et al.  Continuous Space Translation Models with Neural Networks , 2012, NAACL.

[92]  Gholamreza Haffari,et al.  Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[93]  Satoshi Nakamura,et al.  Neural Reranking Improves Subjective Quality of Machine Translation: NAIST at WAT2015 , 2015, WAT.

[94]  Mikel L. Forcada,et al.  Recursive Hetero-associative Memories for Translation , 1997, IWANN.

[95]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[96]  Ming Zhou,et al.  Learning Topic Representation for SMT with Neural Networks , 2014, ACL.

[97]  Rico Sennrich,et al.  Linguistic Input Features Improve Neural Machine Translation , 2016, WMT.

[98]  Philipp Koehn,et al.  Findings of the 2015 Workshop on Statistical Machine Translation , 2015, WMT@EMNLP.

[99]  Holger Schwenk,et al.  Continuous Space Translation Models for Phrase-Based Statistical Machine Translation , 2012, COLING.

[100]  Nenghai Yu,et al.  Word Alignment Modeling with Context Dependent Deep Neural Network , 2013, ACL.

[101]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[102]  Yu Zhou,et al.  RNN-based Derivation Structure Prediction for SMT , 2014, ACL.

[103]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[104]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[105]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.