Distribution Regression for Sequential Data

Distribution regression refers to the supervised learning problem where labels are only available for groups of inputs instead of individual inputs. In this paper, we develop a rigorous mathematical framework for distribution regression where inputs are complex data streams. Leveraging properties of the expected signature and a recent signature kernel trick for sequential data from stochastic analysis, we introduce two new learning techniques, one feature-based and the other kernel-based. Each is suited to a different data regime in terms of the number of data streams and the dimensionality of the individual streams. We provide theoretical results on the universality of both approaches and demonstrate empirically their robustness to irregularly sampled multivariate time-series, achieving state-of-the-art performance on both synthetic and real-world examples from thermodynamics, mathematical finance and agricultural science.

[1]  M. Rosenbaum,et al.  Volatility is rough , 2014, 1410.3394.

[2]  P J Moore,et al.  Using path signatures to predict a diagnosis of Alzheimer’s disease , 2018, PloS one.

[3]  Terry Lyons,et al.  A signature-based machine learning model for distinguishing bipolar disorder and borderline personality disorder , 2017, Translational Psychiatry.

[4]  Sudhanshu Sekhar Panda,et al.  Application of Vegetation Indices for Agricultural Crop Yield Prediction Using Neural Network Techniques , 2010, Remote. Sens..

[5]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[6]  Stefano Ermon,et al.  Deep Gaussian Process for Crop Yield Prediction Based on Remote Sensing Data , 2017, AAAI.

[7]  Kiri Wagstaff,et al.  Multiple-Instance Regression with Structured Data , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[8]  Ataur Rahman,et al.  NDVI Derived Sugarcane Area Identification and Crop Condition Assessment , 2001 .

[9]  Tomoko Matsui,et al.  A Kernel for Time Series Based on Global Alignments , 2006, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[10]  Arnaud Doucet,et al.  Autoregressive Kernels For Time Series , 2011, 1101.0673.

[11]  Bernhard Schölkopf,et al.  Learning from Distributions via Support Measure Machines , 2012, NIPS.

[12]  Adeline Fermanian Embedding and learning with signatures , 2019, ArXiv.

[13]  École d'été de probabilités de Saint-Flour,et al.  Differential equations driven by rough paths , 2007 .

[14]  L. Reichl A modern course in statistical physics , 1980 .

[15]  Yangru Wu,et al.  Mean Reversion across National Stock Markets and Parametric Contrarian Investment Strategies , 2000 .

[16]  David R. Musicant,et al.  Supervised Learning by Training on Aggregate Outputs , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[17]  Patrick Kidger,et al.  Signatory: differentiable computations of the signature and logsignature transforms, on both CPU and GPU , 2020, ArXiv.

[18]  Franz J. Király,et al.  Kernels for sequentially ordered data , 2016, J. Mach. Learn. Res..

[19]  Andrey Kormilitzin,et al.  A Primer on the Signature Method in Machine Learning , 2016, ArXiv.

[20]  Kenji Fukumizu,et al.  Persistence weighted Gaussian kernel for topological data analysis , 2016, ICML.

[21]  C. Caramanis What is ergodic theory , 1963 .

[22]  Andrew Gordon Wilson,et al.  GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration , 2018, NeurIPS.

[23]  G. Gallavotti,et al.  Nonequilibrium thermodynamics , 2003, 1901.08821.

[24]  Terry Lyons,et al.  Computing the full signature kernel as the solution of a Goursat problem , 2020, ArXiv.

[25]  Kuo-Tsai Chen,et al.  Integration of Paths, Geometric Invariants and a Generalized Baker- Hausdorff Formula , 1957 .

[26]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[27]  Terry Lyons,et al.  Characteristic functions of measures on geometric rough paths , 2013, 1307.3580.

[28]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[29]  Hao Ni The expected signature of a stochastic process , 2012 .

[30]  Kenji Fukumizu,et al.  Variational Learning on Aggregate Outputs with Gaussian Processes , 2018, NeurIPS.

[31]  Patrick Kidger,et al.  Neural Controlled Differential Equations for Irregular Time Series , 2020, NeurIPS.

[32]  Terry Lyons Di erential equations driven by rough signals , 1998 .

[33]  Michalis Vazirgiannis,et al.  Rep the Set: Neural Networks for Learning Set Representations , 2019, AISTATS.

[34]  Andreas Christmann,et al.  Universal Kernels on Non-Standard Input Spaces , 2010, NIPS.

[35]  Christophe Ladroue,et al.  Parameter estimation for rough differential equations , 2008, 0812.3102.

[36]  Terry Lyons,et al.  Expected signature of Brownian Motion up to the first exit time from a bounded domain , 2011, 1101.5902.

[37]  Yin Cheng Ng,et al.  Bayesian Semi-supervised Learning with Graph Gaussian Processes , 2018, NeurIPS.

[38]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[39]  François Clemens,et al.  Interpolation in Time Series : An Introductive Overview of Existing Methods, Their Performance Criteria and Uncertainty Assessment , 2017 .

[40]  Cordelia Schmid,et al.  Leveraging the Path Signature for Skeleton-based Human Action Recognition , 2017, ArXiv.

[41]  Jeffrey P. Walker,et al.  THE GLOBAL LAND DATA ASSIMILATION SYSTEM , 2004 .

[42]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[43]  Dino Sejdinovic,et al.  Bayesian Distribution Regression , 2017, ArXiv.

[44]  Benjamin Graham,et al.  The iisignature library: efficient calculation of iterated-integral signatures and log signatures , 2017, ACM Trans. Math. Softw..

[45]  L. Decreusefond,et al.  Stochastic Analysis of the Fractional Brownian Motion , 1999 .

[46]  Michael A. Osborne,et al.  On the Limitations of Representing Functions on Sets , 2019, ICML.

[47]  A. Huete,et al.  Overview of the radiometric and biophysical performance of the MODIS vegetation indices , 2002 .

[48]  Terry Lyons,et al.  Uniqueness for the signature of a path of bounded variation and the reduced path group , 2005, math/0507536.

[49]  I. Chevyrev,et al.  Signature Moments to Characterize Laws of Stochastic Processes , 2018, J. Mach. Learn. Res..

[50]  Terry Lyons,et al.  Optimal Execution with Rough Path Signatures , 2019, SIAM J. Financial Math..

[51]  Benjamin Graham,et al.  Sparse arrays of signatures for online character recognition , 2013, ArXiv.

[52]  Dino Sejdinovic,et al.  Bayesian Approaches to Distribution Regression , 2017, AISTATS.

[53]  T. L. Hill,et al.  An Introduction to Statistical Thermodynamics , 1960 .

[54]  Theodoros Damoulas,et al.  Multi-resolution Multi-task Gaussian Processes , 2019, NeurIPS.

[55]  Terry Lyons Rough paths, Signatures and the modelling of functions on streams , 2014, 1405.4537.

[56]  Marco Cuturi,et al.  Soft-DTW: a Differentiable Loss Function for Time-Series , 2017, ICML.

[57]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[58]  Arthur Gretton,et al.  Learning Theory for Distribution Regression , 2014, J. Mach. Learn. Res..