Learning Long-Term Dependencies in Irregularly-Sampled Time Series

Recurrent neural networks (RNNs) with continuous-time hidden states are a natural fit for modeling irregularly-sampled time series. These models, however, face difficulties when the input data possess long-term dependencies. We prove that similar to standard RNNs, the underlying reason for this issue is the vanishing or exploding of the gradient during training. This phenomenon is expressed by the ordinary differential equation (ODE) representation of the hidden state, regardless of the ODE solver's choice. We provide a solution by designing a new algorithm based on the long short-term memory (LSTM) that separates its memory from its time-continuous state. This way, we encode a continuous-time dynamical flow within the RNN, allowing it to respond to inputs arriving at arbitrary time-lags while ensuring a constant error propagation through the memory path. We call these RNN models ODE-LSTMs. We experimentally show that ODE-LSTMs outperform advanced RNN-based counterparts on non-uniformly sampled data with long-term dependencies. All code and data is available at this https URL.

[1]  David Duvenaud,et al.  Latent Ordinary Differential Equations for Irregularly-Sampled Time Series , 2019, NeurIPS.

[2]  David S. Matteson,et al.  Functional Autoregression for Sparsely Sampled Data , 2016, 1603.02982.

[3]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[4]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[5]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[6]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[7]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[8]  Mathias Lechner,et al.  Learning representations for binary-classification without backpropagation , 2020, ICLR.

[9]  Julien Mairal,et al.  Recurrent Kernel Networks , 2019, NeurIPS.

[10]  Jason Eisner,et al.  The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process , 2016, NIPS.

[11]  Joseph DelPreto,et al.  Plug-and-play supervisory control using muscle and brain signals for real-time gesture and error detection , 2018, Autonomous Robots.

[12]  Edward De Brouwer,et al.  GRU-ODE-Bayes: Continuous modeling of sporadically-observed time series , 2019, NeurIPS.

[13]  Michael I. Jordan,et al.  RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[14]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[15]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Radu Grosu,et al.  Neural circuit policies enabling auditable autonomy , 2020, Nature Machine Intelligence.

[18]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Radu Grosu,et al.  Response Characterization for Auditing Cell Dynamics in Long Short-term Memory Networks , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[21]  Radu Grosu,et al.  Designing Worm-inspired Neural Networks for Interpretable Robotic Control , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[22]  Sekhar Tatikonda,et al.  Adaptive Checkpoint Adjoint Method for Gradient Estimation in Neural ODE , 2020, ICML.

[23]  Alex Sherstinsky,et al.  Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network , 2018, Physica D: Nonlinear Phenomena.

[24]  Radu Grosu,et al.  Liquid Time-constant Networks , 2020, ArXiv.

[25]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[26]  Vincent Y. F. Tan,et al.  On Robustness of Neural Ordinary Differential Equations , 2020, ICLR.

[27]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[28]  Radu Grosu,et al.  Non-Associative Learning Representation in the Nervous System of the Nematode Caenorhabditis elegans , 2017, ArXiv.

[29]  Yee Whye Teh,et al.  Augmented Neural ODEs , 2019, NeurIPS.

[30]  Alexandre M. Bayen,et al.  Scalable Linear Causal Inference for Irregularly Sampled Time Series with Long Range Dependencies , 2016, ArXiv.

[31]  Michael C. Mozer,et al.  Discrete Event, Continuous Time RNNs , 2017, ArXiv.

[32]  Patrick Kidger,et al.  Neural Controlled Differential Equations for Irregular Time Series , 2020, NeurIPS.

[33]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[34]  Jonathan Masci,et al.  SNODE: Spectral Discretization of Neural ODEs for System Identification , 2020, ICLR.

[35]  Benjamin M. Marlin,et al.  A scalable end-to-end Gaussian process adapter for irregularly sampled time series classification , 2016, NIPS.

[36]  Shih-Chii Liu,et al.  Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences , 2016, NIPS.

[37]  Ramin M. Hasani Interpretable Recurrent Neural Networks in Continuous-time Control Environments , 2020 .

[38]  Wolfram Bunk,et al.  Transcripts: an algebraic approach to coupled time series. , 2012, Chaos.

[39]  Zaïd Harchaoui,et al.  A Statistical Investigation of Long Memory in Language and Music , 2019, ICML.

[40]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[41]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[42]  Radu Grosu,et al.  A Natural Lottery Ticket Winner: Reinforcement Learning with Ordinary Neural Circuits , 2020, ICML.

[43]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[44]  Iain Murray,et al.  Neural Spline Flows , 2019, NeurIPS.

[45]  M. L. Chambers The Mathematical Theory of Optimal Processes , 1965 .

[46]  Wei Cao,et al.  BRITS: Bidirectional Recurrent Imputation for Time Series , 2018, NeurIPS.

[47]  D. Roy,et al.  Robust Landsat-based crop time series modelling , 2020 .

[48]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[49]  Cheng Wang,et al.  State-Regularized Recurrent Neural Networks , 2019, ICML.

[50]  Austin R. Benson,et al.  Neural Jump Stochastic Differential Equations , 2019, NeurIPS.

[51]  Grant Foster,et al.  Wavelets for period analysis of unevenly sampled time series , 1996 .

[52]  Radu Grosu,et al.  Gershgorin Loss Stabilizes the Recurrent Neural Network Compartment of an End-to-end Robot Learning Scheme , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[53]  J. Dormand,et al.  A family of embedded Runge-Kutta formulae , 1980 .

[54]  Vladlen Koltun,et al.  Learning to Control PDEs with Differentiable Physics , 2020, ICLR.

[55]  Yuichi Nakamura,et al.  Approximation of dynamical systems by continuous time recurrent neural networks , 1993, Neural Networks.

[56]  C. Runge Ueber die numerische Auflösung von Differentialgleichungen , 1895 .

[57]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[58]  Kurt Keutzer,et al.  ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs , 2019, IJCAI.

[59]  Yuanzhi Li,et al.  Can SGD Learn Recurrent Neural Networks with Provable Generalization? , 2019, NeurIPS.