Fast and stable deterministic approximation of general symmetric kernel matrices in high dimensions

Kernel methods are used frequently in various applications of machine learning. For large-scale applications, the success of kernel methods hinges on the ability to operate certain large dense kernel matrix K. To reduce the computational cost, Nyström methods can efficiently compute a low-rank approximation to a symmetric positive semi-definite (SPSD) matrix K through landmark points and many variants have been developed in the past few years. For indefinite kernels, however, it has not even been justified whether Nyström approximations are applicable. In this paper, we study for the first time, both theoretically and numerically, the Nyström method for approximating general symmetric kernels, including indefinite ones. We first develop a unified theoretical framework for analyzing Nyström approximations, which is valid for both SPSD and indefinite kernels and is independent of the specific scheme for selecting landmark points. To address the accuracy and numerical stability issues in Nyström approximation, we then study the impact of data geometry on the spectral property of the corresponding kernel matrix and leverage the discrepancy theory to propose the anchor net method for computing Nyström approximations. The anchor net method operates entirely on the dataset without requiring the access to K or its matrix-vector product and scales linearly for both SPSD and indefinite kernel matrices. Extensive numerical experiments suggest that indefinite kernels are much more challenging than SPSD kernels and most existing methods will suffer from numerical instability. Results on various kinds of kernels and machine learning datasets demonstrate that the new method resolves the numerical instability and achieves better accuracy with smaller computation costs compared to the state-of-the-art Nyström methods.

[1]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[2]  Daniele Calandriello,et al.  On Fast Leverage Score Sampling and Optimal Learning , 2018, NeurIPS.

[3]  Dietrich Braess,et al.  Equilibrated residual error estimator for edge elements , 2007, Math. Comput..

[4]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[5]  Pierre Ladevèze,et al.  Error Estimate Procedure in the Finite Element Method and Applications , 1983 .

[6]  Bernhard Schölkopf,et al.  Training Invariant Support Vector Machines , 2002, Machine Learning.

[7]  Rüdiger Verfürth,et al.  A Posteriori Error Estimation Techniques for Finite Element Methods , 2013 .

[8]  Cameron Musco,et al.  Recursive Sampling for the Nystrom Method , 2016, NIPS.

[9]  Bernard Haasdonk,et al.  Feature space interpretation of SVMs with indefinite kernels , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  AI Koan,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[11]  Bernard Haasdonk,et al.  Tangent distance kernels for support vector machines , 2002, Object recognition supported by user interaction for service robots.

[12]  Ameet Talwalkar,et al.  On sampling-based approximate spectral decomposition , 2009, ICML '09.

[13]  Yasushi Makihara,et al.  Object recognition supported by user interaction for service robots , 2002, Object recognition supported by user interaction for service robots.

[14]  Ivor W. Tsang,et al.  Improved Nyström low-rank approximation and error analysis , 2008, ICML '08.

[15]  Michael W. Mahoney,et al.  Spectral Gap Error Bounds for Improving CUR Matrix Decomposition and the Nyström Method , 2015, AISTATS.

[16]  K. Atkinson Convergence Rates for Approximate Eigenvalues of Compact Integral Operators , 1975 .

[17]  H. Niederreiter Point sets and sequences with small discrepancy , 1987 .

[18]  Alexander J. Smola,et al.  Learning with non-positive kernels , 2004, ICML.

[19]  A. U.S.,et al.  Stable Computation of Multiquadric Interpolants for All Values of the Shape Parameter , 2003 .

[20]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[21]  Michael W. Mahoney,et al.  Fast Randomized Kernel Ridge Regression with Statistical Guarantees , 2015, NIPS.

[22]  Michael W. Mahoney,et al.  Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..

[23]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[24]  Harald Niederreiter,et al.  Random number generation and Quasi-Monte Carlo methods , 1992, CBMS-NSF regional conference series in applied mathematics.

[25]  Martin D. Buhmann,et al.  Radial Basis Functions: Theory and Implementations: Preface , 2003 .

[26]  Russel E. Caflisch,et al.  Quasi-Random Sequences and Their Discrepancies , 1994, SIAM J. Sci. Comput..

[27]  Ameet Talwalkar,et al.  Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[28]  Zhiqiang Cai,et al.  Robust equilibrated a posteriori error estimator for higher order finite element approximations to diffusion problems , 2019, Numerische Mathematik.

[29]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[30]  James T. Kwok,et al.  Clustered Nyström Method for Large Scale Manifold Learning and Dimension Reduction , 2010, IEEE Transactions on Neural Networks.

[31]  Panayot S. Vassilevski,et al.  Eigenvalue Problems for Exponential-Type Kernels , 2019, Comput. Methods Appl. Math..

[32]  F. Pillichshammer,et al.  Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte Carlo Integration , 2010 .

[33]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[34]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[35]  Lauwerens Kuipers,et al.  Uniform distribution of sequences , 1974 .

[36]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.