NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems

Abstract In this paper, we present nmtpy, a flexible Python toolkit based on Theano for training Neural Machine Translation and other neural sequence-to-sequence architectures. nmtpy decouples the specification of a network from the training and inference utilities to simplify the addition of a new architecture and reduce the amount of boilerplate code to be written. nmtpy has been used for LIUM’s top-ranked submissions to WMT Multimodal Machine Translation and News Translation tasks in 2016 and 2017.

[1]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[2]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[3]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[4]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[5]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[6]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[7]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[8]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[10]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[11]  Desmond Elliott,et al.  Multi-Language Image Description with Neural Sequence Models , 2015, ArXiv.

[12]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[14]  Xinlei Chen,et al.  Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.

[15]  Quoc V. Le,et al.  Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.

[16]  Rico Sennrich,et al.  A Joint Dependency Model of Morphological and Syntactic Structure for Statistical Machine Translation , 2015, EMNLP.

[17]  Desmond Elliott,et al.  Multilingual Image Description with Neural Sequence Models , 2015, 1510.04709.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[21]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[22]  Joost van de Weijer,et al.  Does Multimodality Help Human and Machine for Translation and Image Captioning? , 2016, WMT.

[23]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[24]  Philipp Koehn,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .

[25]  Fethi Bougares,et al.  Factored Neural Machine Translation Architectures , 2016, IWSLT.

[26]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[27]  Fethi Bougares,et al.  Multimodal Attention for Neural Machine Translation , 2016, ArXiv.

[28]  Khalil Sima'an,et al.  A Shared Task on Multimodal Machine Translation and Crosslingual Image Description , 2016, WMT.

[29]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[30]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[31]  Jindrich Libovický,et al.  Neural Monkey: An Open-source Tool for Sequence Learning , 2017, Prague Bull. Math. Linguistics.

[32]  Joost van de Weijer,et al.  LIUM-CVC Submissions for WMT18 Multimodal Translation Task , 2018, WMT.

[33]  Rico Sennrich,et al.  Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[34]  Fethi Bougares,et al.  LIUM Machine Translation Systems for WMT17 News Translation Task , 2017, WMT.

[35]  Hakan Inan,et al.  Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.

[36]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[37]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[38]  Desmond Elliott,et al.  Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description , 2017, WMT.