论文信息 - Can recurrent neural networks warp time?

Can recurrent neural networks warp time?

Successful recurrent models such as long short-term memories (LSTMs) and gated recurrent units (GRUs) use ad hoc gating mechanisms. Empirically these models have been found to improve the learning of medium to long term temporal dependencies and to help with vanishing gradient issues. We prove that learnable gates in a recurrent model formally provide quasi- invariance to general time transformations in the input data. We recover part of the LSTM architecture from a simple axiomatic approach. This result leads to a new way of initializing gate biases in LSTMs and GRUs. Ex- perimentally, this new chrono initialization is shown to greatly improve learning of long term dependencies, with minimal implementation effort.

Yann Ollivier | Corentin Tallec | Y. Ollivier | Corentin Tallec

[1] Jürgen Schmidhuber,et al. Recurrent nets that time and count , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[2] Yann LeCun,et al. Recurrent Orthogonal Networks and Long-Memory Tasks , 2016, ICML.

[3] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[4] Geoffrey E. Hinton,et al. A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.

[5] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.

[6] Wojciech Zaremba,et al. An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[7] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[8] Yoshua Bengio,et al. Object Recognition with Gradient-Based Learning , 1999, Shape, Contour and Grouping in Computer Vision.

[9] Razvan Pascanu,et al. Understanding the exploding gradient problem , 2012, ArXiv.

[10] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.

[11] Les E. Atlas,et al. Full-Capacity Unitary Recurrent Neural Networks , 2016, NIPS.

[12] Ilya Sutskever,et al. Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[13] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.