Very Sparse LSSVM Reductions for Large-Scale Data

Least squares support vector machines (LSSVMs) have been widely applied for classification and regression with comparable performance with SVMs. The LSSVM model lacks sparsity and is unable to handle large-scale data due to computational and memory constraints. A primal fixed-size LSSVM (PFS-LSSVM) introduce sparsity using Nyström approximation with a set of prototype vectors (PVs). The PFS-LSSVM model solves an overdetermined system of linear equations in the primal. However, this solution is not the sparsest. We investigate the sparsity-error tradeoff by introducing a second level of sparsity. This is done by means of L0 -norm-based reductions by iteratively sparsifying LSSVM and PFS-LSSVM models. The exact choice of the cardinality for the initial PV set is not important then as the final model is highly sparse. The proposed method overcomes the problem of memory constraints and high computational costs resulting in highly sparse reductions to LSSVM models. The approximations of the two models allow to scale the models to large-scale datasets. Experiments on real-world classification and regression data sets from the UCI repository illustrate that these approaches achieve sparse models without a significant tradeoff in errors.

[1]  Yuh-Jye Lee,et al.  SSVM: A Smooth Support Vector Machine for Classification , 2001, Comput. Optim. Appl..

[2]  D. W. Scott,et al.  Biased and Unbiased Cross-Validation in Density Estimation , 1987 .

[3]  Stephan R. Sain,et al.  Multi-dimensional Density Estimation , 2004 .

[4]  S. Sathiya Keerthi,et al.  Building Support Vector Machines with Reduced Classifier Complexity , 2006, J. Mach. Learn. Res..

[5]  E. Nyström Über Die Praktische Auflösung von Integralgleichungen mit Anwendungen auf Randwertaufgaben , 1930 .

[6]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[7]  Johan A. K. Suykens,et al.  Optimized fixed-size kernel models for large data sets , 2010, Comput. Stat. Data Anal..

[8]  A. Bowman An alternative method of cross-validation for the smoothing of density estimates , 1984 .

[9]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[10]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[11]  Charles C. Taylor,et al.  Bootstrap choice of the smoothing parameter in kernel density estimation , 1989 .

[12]  Johan A. K. Suykens,et al.  Sparse LS-SVMs with L0 - norm minimization , 2011, ESANN.

[13]  Yoshinobu Hotta,et al.  Sparse learning for support vector classification , 2010, Pattern Recognit. Lett..

[14]  Gavin C. Cawley,et al.  Improved sparse least-squares support vector machines , 2002, Neurocomputing.

[15]  Yi Yang,et al.  Support vector machine based methods for non-intrusive identification of miscellaneous electric loads , 2012, IECON 2012 - 38th Annual Conference on IEEE Industrial Electronics Society.

[16]  R. Fletcher Practical Methods of Optimization , 1988 .

[17]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[18]  R. Taylor,et al.  The Numerical Treatment of Integral Equations , 1978 .

[19]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[20]  M. Rudemo Empirical Choice of Histograms and Kernel Density Estimators , 1982 .

[21]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[22]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[23]  Johan A. K. Suykens,et al.  Sparse approximation using least squares support vector machines , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[24]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[25]  Johan A. K. Suykens,et al.  Sparse conjugate directions pursuit with application to fixed-size kernel models , 2011, Machine Learning.

[26]  Johan A. K. Suykens,et al.  Coupled Simulated Annealing , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[27]  Tom Downs,et al.  Exact Simplification of Support Vector Solutions , 2002, J. Mach. Learn. Res..

[28]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[29]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[30]  Weidong Zhang,et al.  Improved sparse least-squares support vector machine classifiers , 2006, Neurocomputing.

[31]  Johan A. K. Suykens,et al.  Reducing the Number of Support Vectors of SVM Classifiers Using the Smoothed Separable Case Approximation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Johan A. K. Suykens,et al.  Sparse Reductions for Fixed-Size Least Squares Support Vector Machines on Large Scale Data , 2013, PAKDD.

[33]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[34]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[35]  G. Micula,et al.  Numerical Treatment of the Integral Equations , 1999 .

[36]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.