Parallel Column Subset Selection of Kernel Matrix for Scaling up Support Vector Machines

Nystrom method and low-rank linearized Support Vector Machines (SVMs) are two widely used methods for scaling up kernel SVMs, both of which need to sample part of columns of the kernel matrix to reduce the size. However, existing non-uniform sampling methods suffer from at least quadratic time complexity in the number of training data, limiting the scalability of kernel SVMs. In this paper, we propose a parallel sampling method called parallel column subset selection (PCSS) based on the divide-and-conquer strategy, which divides the kernel matrix into several small submatrices and then selects columns in parallel. We prove that PCSS has a (1+\(\epsilon \)) relative-error upper bound with respect to the kernel matrix. Further, we present two approaches to scaling up kernel SVMs by combining PCSS with Nystrom method and low-rank linearized SVMs. The results of comparison experiments demonstrate the effectiveness, efficiency and scalability of our approaches.

[1]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Venkatesan Guruswami,et al.  Optimal column-based low-rank matrix reconstruction , 2011, SODA.

[3]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[4]  Shizhong Liao,et al.  Nyström Approximate Model Selection for LSSVM , 2012, PAKDD.

[5]  Ivor W. Tsang,et al.  Improved Nyström low-rank approximation and error analysis , 2008, ICML '08.

[6]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[7]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[8]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[9]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .

[12]  Ameet Talwalkar,et al.  Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[13]  S. Muthukrishnan,et al.  Subspace Sampling and Relative-Error Matrix Approximation: Column-Based Methods , 2006, APPROX-RANDOM.

[14]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[15]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[16]  S. Muthukrishnan,et al.  Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..

[17]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[18]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[19]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.