Approximate Model Selection for Large Scale LSSVM

Model selection is critical to least squares support vector machine (LSSVM). A major problem of existing model selection approaches of LSSVM is that the inverse of the kernel matrix need to be calculated with O(n 3 ) complexity for each iteration, where n is the number of training examples. It is prohibitive for the large scale application. In this paper, we propose an approximate approach to model selection of LSSVM. We use multilevel circulant matrices to approximate the kernel matrix so that the fast Fourier transform (FFT) can be applied to reduce the computational cost of matrix inverse. With such approximation, we rst design an ecient LSSVM algorithm with O(n log(n)) complexity and theoretically analyze the eect of kernel matrix approximation on the decision function of LSSVM. We further show that the approximate optimal model produced with the multilevel circulant matrix is consistent with the accurate one produced with the original kernel matrix. Under the guarantee of consistency, we present an approximate model selection scheme, whose complexity is signicantly lower than the previous approaches. Experimental results on benchmark datasets demonstrate the eectiveness of approximate model selection.

[1]  Yuesheng Xu,et al.  Approximation of kernel matrices by circulant matrices and its application in kernel selection methods , 2010 .

[2]  Isabelle Guyon,et al.  Model Selection: Beyond the Bayesian/Frequentist Divide , 2010, J. Mach. Learn. Res..

[3]  Gavin C. Cawley,et al.  Fast exact leave-one-out cross-validation of sparse least-squares support vector machines , 2004, Neural Networks.

[4]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[5]  Olivier Chapelle,et al.  Model Selection for Support Vector Machines , 1999, NIPS.

[6]  Yuesheng Xu,et al.  Approximation of high-dimensional kernel matrices by multilevel circulant matrices , 2010, J. Complex..

[7]  S. R. Simanca,et al.  On Circulant Matrices , 2012 .

[8]  Ameet Talwalkar,et al.  On the Impact of Kernel Approximation on Learning Accuracy , 2010, AISTATS.

[9]  E. E. Tyrtyshnikov A unifying approach to some old and new theorems on distribution and clustering , 1996 .

[10]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[11]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[12]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[13]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[14]  Gavin C. Cawley,et al.  Preventing Over-Fitting during Model Selection via Bayesian Regularisation of the Hyper-Parameters , 2007, J. Mach. Learn. Res..

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  Guohui Song Approximation of kernel matrices in machine learning , 2009 .

[17]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[18]  S. Sathiya Keerthi,et al.  Evaluation of simple performance measures for tuning SVM hyperparameters , 2003, Neurocomputing.

[19]  Huanhuan Chen,et al.  Probabilistic Classification Vector Machines , 2009, IEEE Transactions on Neural Networks.

[20]  V. Vapnik,et al.  Bounds on Error Expectation for Support Vector Machines , 2000, Neural Computation.

[21]  K. Johana,et al.  Benchmarking Least Squares Support Vector Machine Classifiers , 2022 .

[22]  Koby Crammer,et al.  Kernel Design Using Boosting , 2002, NIPS.