Neural CDEs for Long Time Series via the Log-ODE Method

Neural Controlled Differential Equations (Neural CDEs) are the continuous-time analogue of an RNN, just as Neural ODEs are analogous to ResNets. However just like RNNs, training Neural CDEs can be difficult for long time series. Here, we propose to apply a technique drawn from stochastic analysis, namely the log-ODE method. Instead of using the original input sequence, our procedure summarises the information over local time intervals via the log-signature map, and uses the resulting shorter stream of log-signatures as the new input. This represents a length/channel trade-off. In doing so we demonstrate efficacy on problems of length up to 17k observations and observe significant training speed-ups, improvements in model performance, and reduced memory requirements compared to the existing algorithm.

[1]  Lajos Gergely Gyurko Numerical methods for approximating solutions to rough differential equations , 2008 .

[2]  Peter K. Friz,et al.  Multidimensional Stochastic Processes as Rough Paths: Theory and Applications , 2010 .

[3]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[4]  Terry Lyons,et al.  Learning stochastic differential equations using RNN with log signature features , 2019, ArXiv.

[5]  Thomas S. Huang,et al.  Dilated Recurrent Neural Networks , 2017, NIPS.

[6]  Patrick Kidger,et al.  Signatory: differentiable computations of the signature and logsignature transforms, on both CPU and GPU , 2020, ArXiv.

[7]  Marie-Francine Moens,et al.  A survey on the application of recurrent neural networks to statistical language modeling , 2015, Comput. Speech Lang..

[8]  Wenhu Chen,et al.  Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting , 2019, NeurIPS.

[9]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[10]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[11]  Terry Lyons,et al.  Uniqueness for the signature of a path of bounded variation and the reduced path group , 2005, math/0507536.

[12]  Benjamin Graham,et al.  The iisignature library: efficient calculation of iterated-integral signatures and log signatures , 2017, ACM Trans. Math. Softw..

[13]  Terry Lyons Rough paths, Signatures and the modelling of functions on streams , 2014, 1405.4537.

[14]  École d'été de probabilités de Saint-Flour,et al.  Differential equations driven by rough paths , 2007 .

[15]  S. Blanes,et al.  The Magnus expansion and some of its applications , 2008, 0810.5488.

[16]  Edward De Brouwer,et al.  GRU-ODE-Bayes: Continuous modeling of sporadically-observed time series , 2019, NeurIPS.

[17]  E. Wong,et al.  On the Convergence of Ordinary Integrals to Stochastic Integrals , 1965 .

[18]  Patrick Kidger,et al.  Universal Approximation with Deep Narrow Networks , 2019, COLT 2019.

[19]  Yoshua Bengio,et al.  Gated Orthogonal Recurrent Units: On Learning to Forget , 2017, Neural Computation.

[20]  Arend Janssen,et al.  Order book models, signatures and numerical approximations of rough differential equations , 2012 .

[21]  Atri Rudra,et al.  HiPPO: Recurrent Memory with Optimal Polynomial Projections , 2020, NeurIPS.

[22]  Vsevolod Sourkov,et al.  IGLOO: Slicing the Features Space to Represent Long Sequences , 2018, ArXiv.

[23]  Jordi Torres,et al.  Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks , 2017, ICLR.

[24]  Terry Lyons,et al.  Areas of areas generate the shuffle algebra , 2020, 2002.02338.

[25]  Jessica G. Gaines,et al.  Variable Step Size Control in the Numerical Solution of Stochastic Differential Equations , 1997, SIAM J. Appl. Math..

[26]  Alex Graves,et al.  Supervised Sequence Labelling , 2012 .

[27]  Patrick Kidger,et al.  Neural Controlled Differential Equations for Irregular Time Series , 2020, NeurIPS.

[28]  Terry Lyons,et al.  Dimension-free Euler estimates of rough differential equations , 2013 .

[29]  R. Ryan Introduction to Tensor Products of Banach Spaces , 2002 .

[30]  Terry Lyons,et al.  Pathwise approximation of SDEs by coupling piecewise abelian rough paths , 2015, 1505.01298.

[31]  Les E. Atlas,et al.  Full-Capacity Unitary Recurrent Neural Networks , 2016, NIPS.

[32]  James Foster,et al.  An Optimal Polynomial Approximation of Brownian Motion , 2019, SIAM J. Numer. Anal..

[33]  Shuai Li,et al.  Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Eamonn J. Keogh,et al.  The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances , 2016, Data Mining and Knowledge Discovery.

[35]  Mathias Lechner,et al.  Learning Long-Term Dependencies in Irregularly-Sampled Time Series , 2020, NeurIPS.

[36]  Geoffrey I. Webb,et al.  Monash University, UEA, UCR Time Series Regression Archive , 2020, ArXiv.