Variational Gaussian Processes with Signature Covariances

We introduce a Bayesian approach to learn from stream-valued data by using Gaussian processes with the recently introduced signature kernel as covariance function. To cope with the computational complexity in time and memory that arises with long streams that evolve in large state spaces, we develop a variational Bayes approach with sparse inducing tensors. We provide an implementation based on GPFlow and benchmark this variational Gaussian process model on supervised classification tasks for time series and text (a stream of words).

[1]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[2]  Benjamin Graham,et al.  The iisignature library: efficient calculation of iterated-integral signatures and log signatures , 2017, ACM Trans. Math. Softw..

[3]  George C. Runger,et al.  Learning a symbolic representation for multivariate time series classification , 2015, Data Mining and Knowledge Discovery.

[4]  George C. Runger,et al.  Time series representation and similarity based on local autopatterns , 2016, Data Mining and Knowledge Discovery.

[5]  Richard E. Turner,et al.  Streaming Sparse Gaussian Process Approximations , 2017, NIPS.

[6]  David J. C. MacKay,et al.  BAYESIAN NON-LINEAR MODELING FOR THE PREDICTION COMPETITION , 1996 .

[7]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[8]  Alexander G. de G. Matthews,et al.  Scalable Gaussian process inference using variational methods , 2017 .

[9]  Franz J. Király,et al.  Kernels for sequentially ordered data , 2016, J. Mach. Learn. Res..

[10]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[11]  Harald Oberhauser,et al.  Persistence Paths and Signature Features in Topological Data Analysis , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[13]  Mustafa Gokce Baydogan,et al.  Autoregressive forests for multivariate time series modeling , 2018, Pattern Recognit..

[14]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[15]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[16]  Bernhard Schölkopf,et al.  Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions , 2016, J. Mach. Learn. Res..

[17]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[18]  Kian Ming Adam Chai,et al.  Variational Multinomial Logit Gaussian Process , 2012, J. Mach. Learn. Res..

[19]  Lianwen Jin,et al.  DropSample: A New Training Method to Enhance Deep Convolutional Neural Networks for Large-Scale Unconstrained Handwritten Chinese Character Recognition , 2015, Pattern Recognit..

[20]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[21]  D. Freedman,et al.  On the consistency of Bayes estimates , 1986 .

[22]  James Hensman,et al.  On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes , 2015, AISTATS.

[23]  F. Takens Detecting strange attractors in turbulence , 1981 .

[24]  Timothy Dozat,et al.  Incorporating Nesterov Momentum into Adam , 2016 .

[25]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[26]  Terry Lyons,et al.  A feature set for streams and an application to high-frequency financial tick data , 2014, BigDataScience '14.

[27]  Carl E. Rasmussen,et al.  Understanding Probabilistic Sparse Gaussian Process Approximations , 2016, NIPS.

[28]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[29]  Terry Lyons Rough paths, Signatures and the modelling of functions on streams , 2014, 1405.4537.

[30]  Richard E. Turner,et al.  A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation , 2016, J. Mach. Learn. Res..

[31]  Terry Lyons,et al.  Uniqueness for the signature of a path of bounded variation and the reduced path group , 2005, math/0507536.

[32]  I. Chevyrev,et al.  Signature Moments to Characterize Laws of Stochastic Processes , 2018, J. Mach. Learn. Res..

[33]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[34]  John Langford,et al.  Suboptimal behavior of Bayes and MDL in classification under misspecification , 2004, Machine Learning.

[35]  Lianwen Jin,et al.  Rotation-free online handwritten character recognition using dyadic path signature features, hanging normalization, and deep neural network , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[36]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[37]  Aníbal R. Figueiras-Vidal,et al.  Inter-domain Gaussian Processes for Sparse Inference using Inducing Features , 2009, NIPS.