论文信息 - On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning

On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning

A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n 3 ), where n is the number of training examples. We develop and analyze an algorithm to compute an easily-interpretable low-rank approximation to an n x n Gram matrix G such that computations of interest may be performed more rapidly. The approximation is of the form G k = CW + k C T , where C is a matrix consisting of a small number c of columns of G and W k is the best rank-k approximation to W, the matrix formed by the intersection between those c columns of G and the corresponding c rows of G. An important aspect of the algorithm is the probability distribution used to randomly sample the columns; we will use a judiciously-chosen and data-dependent nonuniform probability distribution. Let ∥.∥ 2 and ∥.∥ F denote the spectral norm and the Frobenius norm, respectively, of a matrix, and let G k be the best rank-k approximation to G. We prove that by choosing O(k/∈ 4 ) columns ∥G-CW + k C T ∥ ξ ≤ ∥G - G k ∥ ξ + ∈ n Σ i=1 G 2 ii , both in expectation and with high probability, for both ξ = 2, F, and for all k: 0 < k < rank(W). This approximation can be computed using O(n) additional space and time, after making two passes over the data from external storage.

Petros Drineas | Michael W. Mahoney | Petros Drineas | P. Drineas

[1] Adi Ben-Israel,et al. Generalized inverses: theory and applications , 1974 .

[2] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[3] L. Delves,et al. Computational methods for integral equations: Frontmatter , 1985 .

[4] V. N. Bogaevski,et al. Matrix Perturbation Theory , 1991 .

[5] Christopher J. C. Burges,et al. Simplified Support Vector Decision Rules , 1996, ICML.

[6] Federico Girosi,et al. An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[7] S. Goreinov,et al. A Theory of Pseudoskeleton Approximations , 1997 .

[8] Bernhard Schölkopf,et al. Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[9] Alan M. Frieze,et al. Clustering in large graphs and matrices , 1999, SODA '99.

[10] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[11] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[12] Christopher K. I. Williams,et al. The Effect of the Input Density Distribution on Kernel-based Classifiers , 2000, ICML.

[13] B. Schölkopf,et al. Sparse Greedy Matrix Approximation for Machine Learning , 2000, ICML.

[14] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[15] Christopher K. I. Williams,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[16] Katya Scheinberg,et al. Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[17] S. Goreinov,et al. The maximum-volume concept in approximation by low-rank matrices , 2001 .

[18] Tong Zhang,et al. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..