On Multiplicative Integration with Recurrent Neural Networks

We introduce a general and simple structural design called Multiplicative Integration (MI) to improve recurrent neural networks (RNNs). MI changes the way in which information from difference sources flows and is integrated in the computational building block of an RNN, while introducing almost no extra parameters. The new structure can be easily embedded into many popular RNN models, including LSTMs and GRUs. We empirically analyze its learning behaviour and conduct evaluations on several tasks using different RNN models. Our experimental results demonstrate that Multiplicative Integration can provide a substantial performance boost over many of the existing RNN models.

[1]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[2]  C. L. Giles,et al.  Second-order recurrent neural networks for grammatical inference , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[3]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[4]  Srimat T. Chakradhar,et al.  First-order versus second-order single-layer recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Christian W. Omlin,et al.  Refining hidden Markov models with recurrent neural networks , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[7]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[8]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[9]  Ilya Sutskever,et al.  SUBWORD LANGUAGE MODELING WITH NEURAL NETWORKS , 2011 .

[10]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[11]  Maneesh Sahani,et al.  Regularization and nonlinearities for neural language models: when are they needed? , 2013, ArXiv.

[12]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[13]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[14]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[15]  Daniel Jurafsky,et al.  First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs , 2014, ArXiv.

[16]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[18]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[19]  Jianfeng Gao,et al.  Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[20]  Yajie Miao,et al.  EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[21]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[24]  Yoshua Bengio,et al.  Gated Feedback Recurrent Neural Networks , 2015, ICML.

[25]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[26]  Yoshua Bengio,et al.  Deconstructing the Ladder Network Architecture , 2015, ICML.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Alex Graves,et al.  Grid Long Short-Term Memory , 2015, ICLR.

[29]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[30]  Yoshua Bengio,et al.  End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Yoshua Bengio,et al.  Architectural Complexity Measures of Recurrent Neural Networks , 2016, NIPS.

[32]  Roland Memisevic,et al.  Regularizing RNNs by Stabilizing Activations , 2015, ICLR.

[33]  Aaron C. Courville,et al.  Recurrent Batch Normalization , 2016, ICLR.