论文信息 - Scaling Up Kernel SVM on Limited Resources: A Low-Rank Linearization Approach

Scaling Up Kernel SVM on Limited Resources: A Low-Rank Linearization Approach

Kernel support vector machines (SVMs) deliver state-of-the-art results in many real-world nonlinear classification problems, but the computational cost can be quite demanding in order to maintain a large number of support vectors. Linear SVM, on the other hand, is highly scalable to large data but only suited for linearly separable problems. In this paper, we propose a novel approach called low-rank linearized SVM to scale up kernel SVM on limited resources. Our approach transforms a nonlinear SVM to a linear one via an approximate empirical kernel map computed from efficient kernel low-rank decompositions. We theoretically analyze the gap between the solutions of the approximate and optimal rank- $k$ kernel map, which in turn provides guidance on the sampling scheme of the Nyström approximation. Furthermore, we extend it to a semisupervised metric learning scenario in which partially labeled samples can be exploited to further improve the quality of the low-rank embedding. Our approach inherits rich representability of kernel SVM and high efficiency of linear SVM. Experimental results demonstrate that our approach is more robust and achieves a better tradeoff between model representability and scalability against state-of-the-art algorithms for large-scale SVMs.

[1] Ameet Talwalkar,et al. Sampling Techniques for the Nystrom Method , 2009, AISTATS.

[2] James T. Kwok,et al. Clustered Nyström Method for Large Scale Manifold Learning and Dimension Reduction , 2010, IEEE Transactions on Neural Networks.

[3] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[4] Czech Technical,et al. COFFIN : A Computational Framework for Linear SVMs , 2009 .

[5] Ivor W. Tsang,et al. Improved Nyström low-rank approximation and error analysis , 2008, ICML '08.

[6] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[7] Jitendra Malik,et al. Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Ivor W. Tsang,et al. Generalized Multiple Kernel Learning With Data-Dependent Priors , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[9] Matthias W. Seeger,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[10] Koby Crammer,et al. Multi-Class Pegasos on a Budget , 2010, ICML.

[11] Hui Xiong,et al. Randomization or Condensation?: Linear-Cost Matrix Sketching Via Cascaded Compression Sampling , 2017, KDD.

[12] Rong Jin,et al. Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.

[13] Jason Weston,et al. Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[14] Thorsten Joachims,et al. Sparse kernel SVMs via cutting-plane training , 2009, Machine-mediated learning.

[15] James T. Kwok,et al. Block-quantized kernel matrix for fast spectral embedding , 2006, ICML.

[16] Chih-Jen Lin,et al. Training and Testing Low-degree Polynomial Data Mappings via Linear SVM , 2010, J. Mach. Learn. Res..

[17] Dan Roth,et al. Selective block minimization for faster convergence of limited memory large-scale linear models , 2011, KDD.

[18] Katya Scheinberg,et al. Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[19] Michael I. Jordan,et al. Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[20] Inderjit S. Dhillon,et al. Low-Rank Kernel Learning with Bregman Matrix Divergences , 2009, J. Mach. Learn. Res..

[21] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.

[22] Johan A. K. Suykens,et al. Very Sparse LSSVM Reductions for Large-Scale Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[23] Chih-Jen Lin,et al. Large Linear Classification When Data Cannot Fit in Memory , 2011, TKDD.

[24] Yuh-Jye Lee,et al. RSVM: Reduced Support Vector Machines , 2001, SDM.

[25] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[26] Dong Xu,et al. Robust Kernel Low-Rank Representation , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[27] Bernhard Schölkopf,et al. Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[28] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[29] Bernhard Schölkopf,et al. A Direct Method for Building Sparse Kernel Learning Algorithms , 2006, J. Mach. Learn. Res..

[30] Chih-Jen Lin,et al. A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[31] Francis R. Bach,et al. Sharp analysis of low-rank kernel matrix approximations , 2012, COLT.

[32] Sören Sonnenburg,et al. COFFIN: A Computational Framework for Linear SVMs , 2010, ICML.

[33] Chih-Jen Lin,et al. A study on reduced support vector machines , 2003, IEEE Trans. Neural Networks.

[34] Michael I. Jordan,et al. Predictive low-rank decomposition for kernel methods , 2005, ICML.

[35] Igor Durdanovic,et al. Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.

[36] Nello Cristianini,et al. Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[37] Bin Gu,et al. Incremental Support Vector Learning for Ordinal Regression , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[38] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[39] Mikhail Belkin,et al. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[40] Ameet Talwalkar,et al. On the Impact of Kernel Approximation on Learning Accuracy , 2010, AISTATS.

[41] C. Micchelli. Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[42] Bernhard Schölkopf,et al. Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[43] Petros Drineas,et al. On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[44] Zheng Chen,et al. P-packSVM: Parallel Primal grAdient desCent Kernel SVM , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[46] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[47] Koby Crammer,et al. Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification , 2011, KDD.

[48] Michael I. Jordan,et al. On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[49] Koby Crammer,et al. Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..

[50] Alexander J. Smola,et al. Fastfood - Computing Hilbert Space Expansions in loglinear time , 2013, ICML.

[51] Bernhard Schölkopf,et al. Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[52] Inderjit S. Dhillon,et al. A Divide-and-Conquer Solver for Kernel Support Vector Machines , 2013, ICML.

[53] Ivor W. Tsang,et al. Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[54] Slobodan Vucetic,et al. BudgetedSVM: a toolbox for scalable SVM approximations , 2013, J. Mach. Learn. Res..