On Random Subsampling of Gaussian Process Regression: A Graphon-Based Analysis

In this paper, we study random subsampling of Gaussian process regression, one of the simplest approximation baselines, from a theoretical perspective. Although subsampling discards a large part of training data, we show provable guarantees on the accuracy of the predictive mean/variance and its generalization ability. For analysis, we consider embedding kernel matrices into graphons, which encapsulate the difference of the sample size and enables us to evaluate the approximation and generalization errors in a unified manner. The experimental results show that the subsampling approximation achieves a better trade-off regarding accuracy and runtime than the Nystr\"{o}m and random Fourier expansion methods.

[1]  Rong Jin,et al.  Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.

[2]  Subhashis Ghosal,et al.  Supremum Norm Posterior Contraction and Credible Sets for Nonparametric Multivariate Regression , 2014, 1411.6716.

[3]  Andrew Gordon Wilson,et al.  Constant-Time Predictive Distributions for Gaussian Processes , 2018, ICML.

[4]  M. Bálek,et al.  Large Networks and Graph Limits , 2022 .

[5]  Nenad Moraca,et al.  Bounds for norms of the matrix inverse and the smallest singular value , 2008 .

[6]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[7]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[8]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[9]  Carl E. Rasmussen,et al.  Understanding Probabilistic Sparse Gaussian Process Approximations , 2016, NIPS.

[10]  Yuichi Yoshida,et al.  Minimizing Quadratic Functions in Constant Time , 2016, NIPS.

[11]  Edward Lloyd Snelson,et al.  Flexible and efficient Gaussian process models for machine learning , 2007 .

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[14]  Van Der Vaart,et al.  Rates of contraction of posterior distributions based on Gaussian process priors , 2008 .

[15]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[16]  Zoltán Szabó,et al.  Optimal Rates for Random Fourier Features , 2015, NIPS.

[17]  Cameron Musco,et al.  Recursive Sampling for the Nystrom Method , 2016, NIPS.

[18]  Raj Bandyopadhyay,et al.  Predicting airline delays , 2012 .

[19]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[20]  Michael W. Mahoney,et al.  Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..

[21]  Richard Nickl,et al.  Rates of contraction for posterior distributions in Lr-metrics, 1 ≤ r ≤ ∞ , 2011, 1203.2043.

[22]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[23]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[24]  Alexander G. de G. Matthews,et al.  Scalable Gaussian process inference using variational methods , 2017 .

[25]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[26]  Andrew Gordon Wilson,et al.  GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration , 2018, NeurIPS.

[27]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[28]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[29]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[30]  Michael W. Mahoney,et al.  Fast Randomized Kernel Ridge Regression with Statistical Guarantees , 2015, NIPS.

[31]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[32]  Ameya Velingker,et al.  Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees , 2018, ICML.

[33]  Harry van Zanten,et al.  Information Rates of Nonparametric Gaussian Process Methods , 2011, J. Mach. Learn. Res..

[34]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.