Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences

Recurrent Neural Networks (RNNs) have become the state-of-the-art choice for extracting patterns from temporal sequences. Current RNN models are ill suited to process irregularly sampled data triggered by events generated in continuous time by sensors or other neurons. Such data can occur, for example, when the input comes from novel event-driven artificial sensors which generate sparse, asynchronous streams of events or from multiple conventional sensors with different update intervals. In this work, we introduce the Phased LSTM model, which extends the LSTM unit by adding a new time gate. This gate is controlled by a parametrized oscillation with a frequency range which require updates of the memory cell only during a small percentage of the cycle. Even with the sparse updates imposed by the oscillation, the Phased LSTM network achieves faster convergence than regular LSTMs on tasks which require learning of long sequences. The model naturally integrates inputs from sensors of arbitrary sampling rates, thereby opening new areas of investigation for processing asynchronous sensory events that carry timing information. It also greatly improves the performance of LSTMs in standard RNN applications, and does so with an order-of-magnitude fewer computes.

[1]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[2]  Yuichi Nakamura,et al.  Approximation of dynamical systems by continuous time recurrent neural networks , 1993, Neural Networks.

[3]  Gert Cauwenberghs,et al.  An analog VLSI recurrent neural network learning a continuous-time trajectory , 1996, IEEE Trans. Neural Networks.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Jürgen Schmidhuber,et al.  Recurrent nets that time and count , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[6]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[7]  A. Pérez-Villalba Rhythms of the Brain, G. Buzsáki. Oxford University Press, Madison Avenue, New York (2006), Price: GB £42.00, p. 448, ISBN: 0-19-530106-4 , 2008 .

[8]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[9]  Wolfgang Maass,et al.  Bayesian Computation Emerges in Generic Cortical Microcircuits through Spike-Timing-Dependent Plasticity , 2013, PLoS Comput. Biol..

[10]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[12]  Tobi Delbruck,et al.  Real-time classification and sensor fusion with a spiking deep belief network , 2013, Front. Neurosci..

[13]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[14]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[15]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[16]  Tobi Delbrück,et al.  Retinomorphic Event-Based Vision Sensors: Bioinspired Cameras With Spiking Output , 2014, Proceedings of the IEEE.

[17]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[18]  Colin Raffel,et al.  Lasagne: First release. , 2015 .

[19]  Yoshua Bengio,et al.  Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks , 2015, IEEE Transactions on Multimedia.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Gregory Cohen,et al.  Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades , 2015, Front. Neurosci..

[23]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[24]  Shih-Chii Liu,et al.  Effective sensor fusion with event-based sensors and deep network architectures , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[25]  Jürgen Schmidhuber,et al.  Lipreading with long short-term memory , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Garrick Orchard,et al.  Skimming Digits: Neuromorphic Classification of Spike-Encoded Images , 2016, Front. Neurosci..

[27]  Erhardt Barth,et al.  Recurrent Dropout without Memory Loss , 2016, COLING.