Closed-form Continuous-Depth Models

Continuous-depth neural models, where the derivative of the model’s hidden state is defined by a neural network, have enabled strong sequential data processing capabilities. However, these models rely on advanced numerical differential equation (DE) solvers resulting in a significant overhead both in terms of computational cost and model complexity. In this paper, we present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster while exhibiting equally strong modeling abilities compared to their ODE-based counterparts. The models are hereby derived from the analytical closed-form solution of an expressive subset of time-continuous models, thus alleviating the need for complex DE solvers all together. In our experimental evaluations, we demonstrate that CfC networks outperform advanced, recurrent models over a diverse set of time-series prediction tasks, including those with long-term dependencies and irregularly sampled data. We believe our findings open new opportunities to train and deploy rich, continuous neural models in resource-constrained settings, which demand both performance and efficiency.

[1]  Radu Grosu,et al.  On The Verification of Neural ODEs with Stochastic Guarantees , 2020, AAAI.

[2]  Paris Perdikaris,et al.  Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations , 2019, J. Comput. Phys..

[3]  Satya Narayan Shukla,et al.  Multi-Time Attention Networks for Irregularly Sampled Time Series , 2020, ICLR.

[4]  Maximilian Nickel,et al.  Riemannian Continuous Normalizing Flows , 2020, NeurIPS.

[5]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[6]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[7]  Mathias Lechner,et al.  Causal Navigation by Continuous-time Neural Networks , 2021, NeurIPS.

[8]  Kurt Keutzer,et al.  ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs , 2019, IJCAI.

[9]  C. Ré,et al.  HiPPO: Recurrent Memory with Optimal Polynomial Projections , 2020, NeurIPS.

[10]  Brendan T. O'Connor,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics , 2011 .

[11]  Christian Bock,et al.  Set Functions for Time Series , 2019, ICML.

[12]  Satya Narayan Shukla,et al.  Interpolation-Prediction Networks for Irregularly Sampled Time Series , 2019, ICLR.

[13]  David Duvenaud,et al.  Latent ODEs for Irregularly-Sampled Time Series , 2019, ArXiv.

[14]  Radu Grosu,et al.  Gershgorin Loss Stabilizes the Recurrent Neural Network Compartment of an End-to-end Robot Learning Scheme , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Glenn M. Fung,et al.  Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention , 2021, AAAI.

[16]  Thomas A. Henzinger,et al.  Adversarial Training is Not Ready for Robot Learning , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Ed H. Chi,et al.  AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks , 2019, ICLR.

[18]  Huaguang Zhang,et al.  A Comprehensive Review of Stability Analysis of Continuous-Time Recurrent Neural Networks , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[19]  J. Dormand,et al.  High order embedded Runge-Kutta formulae , 1981 .

[20]  Mario Lezcano Casado,et al.  Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group , 2019, ICML.

[21]  Radu Grosu,et al.  Efficient modeling of complex Analog integrated circuits using neural networks , 2016, 2016 12th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME).

[22]  Ramin Hasani,et al.  Sparse Flows: Pruning Continuous-depth Models , 2021, NeurIPS.

[23]  Jason Eisner,et al.  The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process , 2016, NIPS.

[24]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[25]  Joseph DelPreto,et al.  Plug-and-play supervisory control using muscle and brain signals for real-time gesture and error detection , 2018, Autonomous Robots.

[26]  Long Chen,et al.  Maximum Principle Based Algorithms for Deep Learning , 2017, J. Mach. Learn. Res..

[27]  Radu Grosu,et al.  Compositional neural-network modeling of complex analog circuits , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[28]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[29]  Ming-Yu Liu,et al.  PointFlow: 3D Point Cloud Generation With Continuous Normalizing Flows , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Jordi Torres,et al.  Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks , 2017, ICLR.

[31]  Shih-Chii Liu,et al.  Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences , 2016, NIPS.

[32]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[33]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[34]  Mathias Lechner,et al.  Learning Long-Term Dependencies in Irregularly-Sampled Time Series , 2020, NeurIPS.

[35]  Vladlen Koltun,et al.  Deep Equilibrium Models , 2019, NeurIPS.

[36]  Siddhartha Mishra,et al.  Coupled Oscillatory Recurrent Neural Network (coRNN): An accurate and (gradient) stable architecture for learning long time dependencies , 2020, ICLR.

[37]  Michael C. Mozer,et al.  Discrete Event, Continuous Time RNNs , 2017, ArXiv.

[38]  Omri Azencot,et al.  Lipschitz Recurrent Neural Networks , 2020, ICLR.

[39]  Hao Wu,et al.  Stochastic Normalizing Flows , 2020, NeurIPS.

[40]  Ramin M. Hasani Interpretable Recurrent Neural Networks in Continuous-time Control Environments , 2020 .

[41]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[42]  Yuichi Nakamura,et al.  Approximation of dynamical systems by continuous time recurrent neural networks , 1993, Neural Networks.

[43]  Radu Grosu,et al.  A Machine Learning Suite for Machine Components' Health-Monitoring , 2019, AAAI.

[44]  Patrick Kidger,et al.  "Hey, that's not an ODE": Faster ODE Adjoints with 12 Lines of Code , 2020, ArXiv.

[45]  Radu Grosu,et al.  A generative neural network model for the quality prediction of work in progress products , 2019, Appl. Soft Comput..

[46]  Radu Grosu,et al.  c302: a multiscale framework for modelling the nervous system of Caenorhabditis elegans , 2018, Philosophical Transactions of the Royal Society B: Biological Sciences.

[47]  W. Rudin Principles of mathematical analysis , 1964 .

[48]  E Weinan,et al.  A Proposal on Machine Learning via Dynamical Systems , 2017, Communications in Mathematics and Statistics.

[49]  Eran Treister,et al.  IMEXnet: A Forward Stable Deep Neural Network , 2019, ICML.

[50]  Chris Eliasmith,et al.  Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks , 2019, NeurIPS.

[51]  Stephen Grossberg,et al.  Absolute stability of global pattern formation and parallel memory storage by competitive neural networks , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[52]  Radu Grosu,et al.  Response Characterization for Auditing Cell Dynamics in Long Short-term Memory Networks , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[53]  Fathi M. Salem,et al.  Gate-variants of Gated Recurrent Unit (GRU) neural networks , 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS).

[54]  Radu Grosu,et al.  Model-based versus Model-free Deep Reinforcement Learning for Autonomous Racing Cars , 2021, ArXiv.

[55]  David Duvenaud,et al.  Scalable Gradients for Stochastic Differential Equations , 2020, AISTATS.

[56]  Radu Grosu,et al.  Designing Worm-inspired Neural Networks for Interpretable Robotic Control , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[57]  Hajime Asama,et al.  Hypersolvers: Toward Fast Continuous-Depth Models , 2020, NeurIPS.

[58]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[59]  Radu Grosu,et al.  Neural circuit policies enabling auditable autonomy , 2020, Nature Machine Intelligence.

[60]  Liwei Wang,et al.  The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.

[61]  Yee Whye Teh,et al.  Augmented Neural ODEs , 2019, NeurIPS.

[62]  Hajime Asama,et al.  Dissecting Neural ODEs , 2020, NeurIPS.

[63]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[64]  L. Perko Differential Equations and Dynamical Systems , 1991 .

[65]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[66]  Adam M. Oberman,et al.  How to train your neural ODE , 2020, ICML.

[67]  Karl J. Friston,et al.  Dynamic causal modelling , 2003, NeuroImage.