The Statistical Cost of Robust Kernel Hyperparameter Tuning

This paper studies the statistical complexity of kernel hyperparameter tuning in the setting of active regression under adversarial noise. We consider the problem of finding the best interpolant from a class of kernels with unknown hyperparameters, assuming only that the noise is square-integrable. We provide finite-sample guarantees for the problem, characterizing how increasing the complexity of the kernel class increases the complexity of learning kernel hyperparameters. For common kernel classes (e.g. squared-exponential kernels with unknown lengthscale), our results show that hyperparameter optimization increases sample complexity by just a logarithmic factor, in comparison to the setting where optimal parameters are known in advance. Our result is based on a subsampling guarantee for linear regression under multiple design matrices, combined with an {\epsilon}-net argument for discretizing kernel parameterizations.

[1]  Yonina C. Eldar,et al.  Sample Efficient Toeplitz Covariance Estimation , 2019, SODA.

[2]  Andrew Gordon Wilson,et al.  The Human Kernel , 2015, NIPS.

[3]  Ameya Velingker,et al.  A universal sampling method for reconstructing signals with simple Fourier transforms , 2018, STOC.

[4]  Eric Price,et al.  Active Regression via Linear-Sample Sparsification , 2017, COLT.

[5]  Le Song,et al.  A la Carte - Learning Fast Kernels , 2014, AISTATS.

[6]  Ameya Velingker,et al.  Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees , 2018, ICML.

[7]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[8]  Andrew Gordon Wilson,et al.  Function-Space Distributions over Kernels , 2019, NeurIPS.

[9]  Colin Campbell,et al.  Rademacher Chaos Complexities for Learning the Kernel Problem , 2010, Neural Computation.

[10]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[11]  Richard E. Turner,et al.  Learning Stationary Time Series using Gaussian Processes with Nonparametric Kernels , 2015, NIPS.

[12]  Andrew Gordon Wilson,et al.  GPatt: Fast Multidimensional Pattern Extrapolation with Gaussian Processes , 2013, ArXiv.

[13]  Daniel Hernández-Lobato,et al.  Deep Gaussian Processes for Regression using Approximate Expectation Propagation , 2016, ICML.

[14]  Arno Solin,et al.  Variational Fourier Features for Gaussian Processes , 2016, J. Mach. Learn. Res..

[15]  Xue Chen,et al.  Estimating the Frequency of a Clustered Signal , 2019, ICALP.

[16]  Ping Feng,et al.  Spectrum-blind minimum-rate sampling and reconstruction of multiband signals , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[17]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[18]  Yi-Jun He,et al.  State of health estimation of lithium‐ion batteries: A multiscale Gaussian process regression modeling approach , 2015 .

[19]  Jean-Philippe Vert,et al.  Relating Leverage Scores and Density using Regularized Christoffel Functions , 2018, NeurIPS.

[20]  Y. Bresler Spectrum-blind sampling and compressive sensing for continuous-index signals , 2008, 2008 Information Theory and Applications Workshop.

[21]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[22]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[23]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[24]  Michael W. Mahoney,et al.  Fast Randomized Kernel Ridge Regression with Statistical Guarantees , 2015, NIPS.

[25]  A. Cohen,et al.  Optimal weighted least-squares methods , 2016, 1608.00512.

[26]  Yonina C. Eldar,et al.  Blind Multiband Signal Reconstruction: Compressed Sensing for Analog Signals , 2007, IEEE Transactions on Signal Processing.

[27]  Stephen Tyree,et al.  Exact Gaussian Processes on a Million Data Points , 2019, NeurIPS.

[28]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[29]  Jean Honorio,et al.  Optimality Implies Kernel Sum Classifiers are Statistically Efficient , 2019, ICML.

[30]  Mehryar Mohri,et al.  New Generalization Bounds for Learning Kernels , 2009, ArXiv.

[31]  Xue Chen,et al.  Fourier-Sparse Interpolation without a Frequency Gap , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).