Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel k-means Clustering

We present tight lower bounds on the number of kernel evaluations required to approximately solve kernel ridge regression (KRR) and kernel $k$-means clustering (KKMC) on $n$ input points. For KRR, our bound for relative error approximation to the minimizer of the objective function is $\Omega(nd_{\mathrm{eff}}^\lambda/\varepsilon)$ where $d_{\mathrm{eff}}^\lambda$ is the effective statistical dimension, which is tight up to a $\log(d_{\mathrm{eff}}^\lambda/\varepsilon)$ factor. For KKMC, our bound for finding a $k$-clustering achieving a relative error approximation of the objective function is $\Omega(nk/\varepsilon)$, which is tight up to a $\log(k/\varepsilon)$ factor. Our KRR result resolves a variant of an open question of El Alaoui and Mahoney, asking whether the effective statistical dimension is a lower bound on the sampling complexity or not. Furthermore, for the important practical case when the input is a mixture of Gaussians, we provide a KKMC algorithm which bypasses the above lower bound.

[1]  Piotr Indyk,et al.  On the Fine-Grained Complexity of Empirical Risk Minimization: Kernel Methods and Neural Networks , 2017, NIPS.

[2]  Michael W. Mahoney,et al.  Fast Randomized Kernel Methods With Statistical Guarantees , 2014, ArXiv.

[3]  Arya Mazumdar,et al.  Clustering with Noisy Queries , 2017, NIPS.

[4]  Cameron Musco,et al.  Recursive Sampling for the Nystrom Method , 2016, NIPS.

[5]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[6]  Martin J. Wainwright,et al.  Randomized sketches for kernels: Fast and optimal non-parametric regression , 2015, ArXiv.

[7]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[8]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC.

[9]  Tong Zhang,et al.  Learning Bounds for Kernel Regression Using Effective Data Dimensionality , 2005, Neural Computation.

[10]  C. Papadimitriou,et al.  The complexity of massive data set computations , 2002 .

[11]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[12]  Michael B. Cohen,et al.  Dimensionality Reduction for k-Means Clustering and Low Rank Approximation , 2014, STOC.

[13]  Christos Boutsidis,et al.  Unsupervised Feature Selection for the $k$-means Clustering Problem , 2009, NIPS.

[14]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[15]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[16]  Aravindan Vijayaraghavan,et al.  On Learning Mixtures of Well-Separated Gaussians , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[17]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[18]  M. Rudelson,et al.  The smallest singular value of a random rectangular matrix , 2008, 0802.3956.

[19]  David P. Woodruff,et al.  Sublinear Time Low-Rank Approximation of Positive Semidefinite Matrices , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[20]  Adam Tauman Kalai,et al.  Adaptively Learning the Crowd Kernel , 2011, ICML.

[21]  Francis R. Bach,et al.  Sharp analysis of low-rank kernel matrix approximations , 2012, COLT.

[22]  Changshui Zhang,et al.  On the Sample Complexity of Random Fourier Features for Online Learning , 2014, ACM Trans. Knowl. Discov. Data.

[23]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[24]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[25]  Andrew Chi-Chih Yao,et al.  Probabilistic computations: Toward a unified measure of complexity , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[26]  Yishay Mansour,et al.  On the Complexity of Learning with Kernels , 2014, COLT.

[27]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[28]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[29]  David P. Woodruff,et al.  Sharper Bounds for Regularized Data Fitting , 2016, APPROX-RANDOM.