论文信息 - Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel k-means Clustering

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel k-means Clustering

We present tight lower bounds on the number of kernel evaluations required to approximately solve kernel ridge regression (KRR) and kernel $k$-means clustering (KKMC) on $n$ input points. For KRR, our bound for relative error approximation to the minimizer of the objective function is $\Omega(nd_{\mathrm{eff}}^\lambda/\varepsilon)$ where $d_{\mathrm{eff}}^\lambda$ is the effective statistical dimension, which is tight up to a $\log(d_{\mathrm{eff}}^\lambda/\varepsilon)$ factor. For KKMC, our bound for finding a $k$-clustering achieving a relative error approximation of the objective function is $\Omega(nk/\varepsilon)$, which is tight up to a $\log(k/\varepsilon)$ factor. Our KRR result resolves a variant of an open question of El Alaoui and Mahoney, asking whether the effective statistical dimension is a lower bound on the sampling complexity or not. Furthermore, for the important practical case when the input is a mixture of Gaussians, we provide a KKMC algorithm which bypasses the above lower bound.

David P. Woodruff | Manuel Fernandez | Taisuke Yasuda | T. Yasuda | Manuel Fernández

[1] Piotr Indyk,et al. On the Fine-Grained Complexity of Empirical Risk Minimization: Kernel Methods and Neural Networks , 2017, NIPS.

[2] Michael W. Mahoney,et al. Fast Randomized Kernel Methods With Statistical Guarantees , 2014, ArXiv.

[3] Arya Mazumdar,et al. Clustering with Noisy Queries , 2017, NIPS.

[4] Cameron Musco,et al. Recursive Sampling for the Nystrom Method , 2016, NIPS.

[5] Bernhard Schölkopf,et al. Kernel Methods in Computational Biology , 2005 .

[6] Martin J. Wainwright,et al. Randomized sketches for kernels: Fast and optimal non-parametric regression , 2015, ArXiv.

[7] Katya Scheinberg,et al. Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[8] Michael Langberg,et al. A unified framework for approximating and clustering data , 2011, STOC.

[9] Tong Zhang,et al. Learning Bounds for Kernel Regression Using Effective Data Dimensionality , 2005, Neural Computation.

[10] C. Papadimitriou,et al. The complexity of massive data set computations , 2002 .

[11] P. Massart,et al. Adaptive estimation of a quadratic functional by model selection , 2000 .

[12] Michael B. Cohen,et al. Dimensionality Reduction for k-Means Clustering and Low Rank Approximation , 2014, STOC.

[13] Christos Boutsidis,et al. Unsupervised Feature Selection for the $k$-means Clustering Problem , 2009, NIPS.

[14] Sanjoy Dasgupta,et al. An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[15] Cordelia Schmid,et al. Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[16] Aravindan Vijayaraghavan,et al. On Learning Mixtures of Well-Separated Gaussians , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[17] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[18] M. Rudelson,et al. The smallest singular value of a random rectangular matrix , 2008, 0802.3956.

[19] David P. Woodruff,et al. Sublinear Time Low-Rank Approximation of Positive Semidefinite Matrices , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[20] Adam Tauman Kalai,et al. Adaptively Learning the Crowd Kernel , 2011, ICML.

[21] Francis R. Bach,et al. Sharp analysis of low-rank kernel matrix approximations , 2012, COLT.

[22] Changshui Zhang,et al. On the Sample Complexity of Random Fourier Features for Online Learning , 2014, ACM Trans. Knowl. Discov. Data.

[23] John A. Hartigan,et al. Clustering Algorithms , 1975 .

[24] Michael I. Jordan,et al. Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[25] Andrew Chi-Chih Yao,et al. Probabilistic computations: Toward a unified measure of complexity , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[26] Yishay Mansour,et al. On the Complexity of Learning with Kernels , 2014, COLT.

[27] Pierre Hansen,et al. NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[28] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[29] David P. Woodruff,et al. Sharper Bounds for Regularized Data Fitting , 2016, APPROX-RANDOM.