Neural Rough Differential Equations for Long Time Series

Neural controlled differential equations (CDEs) are the continuous-time analogue of recurrent neural networks, as Neural ODEs are to residual networks, and offer a memory-efficient continuoustime way to model functions of potentially irregular time series. Existing methods for computing the forward pass of a Neural CDE involve embedding the incoming time series into path space, often via interpolation, and using evaluations of this path to drive the hidden state. Here, we use rough path theory to extend this formulation. Instead of directly embedding into path space, we instead represent the input signal over small time intervals through its log-signature, which are statistics describing how the signal drives a CDE. This is the approach for solving rough differential equations (RDEs), and correspondingly we describe our main contribution as the introduction of Neural RDEs. This extension has a purpose: by generalising the Neural CDE approach to a broader class of driving signals, we demonstrate particular advantages for tackling long time series. In this regime, we demonstrate efficacy on problems of length up to 17k observations and observe significant training speed-ups, improvements in model performance, and reduced memory requirements compared to existing approaches.

[1]  Patrick Kidger,et al.  Signatory: differentiable computations of the signature and logsignature transforms, on both CPU and GPU , 2020, ICLR.

[2]  École d'été de probabilités de Saint-Flour,et al.  Differential equations driven by rough paths , 2007 .

[3]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[4]  C. Ré,et al.  HiPPO: Recurrent Memory with Optimal Polynomial Projections , 2020, NeurIPS.

[5]  Geoffrey I. Webb,et al.  Monash University, UEA, UCR Time Series Regression Archive , 2020, ArXiv.

[6]  Patrick Kidger,et al.  Universal Approximation with Deep Narrow Networks , 2019, COLT 2019.

[7]  Mathias Lechner,et al.  Learning Long-Term Dependencies in Irregularly-Sampled Time Series , 2020, NeurIPS.

[8]  Thomas S. Huang,et al.  Dilated Recurrent Neural Networks , 2017, NIPS.

[9]  James Foster,et al.  An Optimal Polynomial Approximation of Brownian Motion , 2019, SIAM J. Numer. Anal..

[10]  L. Gyurkó Differential Equations Driven by Π-Rough Paths , 2012, Proceedings of the Edinburgh Mathematical Society.

[11]  Terry Lyons,et al.  Neural Controlled Differential Equations for Irregular Time Series , 2020, NeurIPS.

[12]  Wenhu Chen,et al.  Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting , 2019, NeurIPS.

[13]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[14]  S. Blanes,et al.  The Magnus expansion and some of its applications , 2008, 0810.5488.

[15]  Les E. Atlas,et al.  Full-Capacity Unitary Recurrent Neural Networks , 2016, NIPS.

[16]  Alex Graves,et al.  Supervised Sequence Labelling , 2012 .

[17]  Shuai Li,et al.  Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Terry Lyons,et al.  Learning stochastic differential equations using RNN with log signature features , 2019, ArXiv.

[19]  Benjamin Graham,et al.  The iisignature library: efficient calculation of iterated-integral signatures and log signatures , 2017, ACM Trans. Math. Softw..

[20]  Marie-Francine Moens,et al.  A survey on the application of recurrent neural networks to statistical language modeling , 2015, Comput. Speech Lang..

[21]  E. Wong,et al.  On the Convergence of Ordinary Integrals to Stochastic Integrals , 1965 .

[22]  R. Ryan Introduction to Tensor Products of Banach Spaces , 2002 .

[23]  Jordi Torres,et al.  Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks , 2017, ICLR.

[24]  Terry Lyons Di erential equations driven by rough signals , 1998 .

[25]  Peter K. Friz,et al.  Multidimensional Stochastic Processes as Rough Paths: Theory and Applications , 2010 .

[26]  Chris Eliasmith,et al.  Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks , 2019, NeurIPS.

[27]  Edward De Brouwer,et al.  GRU-ODE-Bayes: Continuous modeling of sporadically-observed time series , 2019, NeurIPS.

[28]  Terry Lyons Rough paths, Signatures and the modelling of functions on streams , 2014, 1405.4537.

[29]  Lajos Gergely Gyurko Numerical methods for approximating solutions to rough differential equations , 2008 .

[30]  David Duvenaud,et al.  Latent ODEs for Irregularly-Sampled Time Series , 2019, ArXiv.

[31]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[32]  Terry Lyons,et al.  Uniqueness for the signature of a path of bounded variation and the reduced path group , 2005, math/0507536.

[33]  Yoshua Bengio,et al.  Gated Orthogonal Recurrent Units: On Learning to Forget , 2017, Neural Computation.

[34]  Terry Lyons,et al.  Dimension-free Euler estimates of rough differential equations , 2013 .

[35]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[36]  Eamonn J. Keogh,et al.  The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances , 2016, Data Mining and Knowledge Discovery.