Fast Bayesian support vector machine parameter tuning with the Nystrom method

We experiment with speeding up a Bayesian method for tuning the hyperparameters of a support vector machine (SVM) classifier. The Bayesian approach gives the gradients of the evidence as averages over the posterior, which can be approximated using hybrid Monte Carlo simulation (HMC). By using the Nystrom approximation to the SVM kernel, our method significantly reduces the dimensionality of the space to be simulated in the HMC. We show that this speeds up the running time of the HMC simulation from O(n/sup 2/) (with a large prefactor) to effectively O(n), where n is the number of training samples. We conclude that the Nystrom approximation has an almost insignificant effect on the performance of the algorithm when compared to the full Bayesian method, and gives excellent performance in comparison with other approaches to hyperparameter tuning.

[1]  Peter Sollich,et al.  Probabilistic Methods for Support Vector Machines , 1999, NIPS.

[2]  Matthias W. Seeger,et al.  Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers , 1999, NIPS.

[3]  James T. Kwok Moderating the outputs of support vector machine classifiers , 1999, IEEE Trans. Neural Networks.

[4]  Christopher K. I. Williams,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[5]  James T. Kwok,et al.  The evidence framework applied to support vector machines , 2000, IEEE Trans. Neural Networks Learn. Syst..

[6]  Ole Winther,et al.  Gaussian Processes for Classification: Mean-Field Algorithms , 2000, Neural Computation.

[7]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[8]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[9]  Wei Chu,et al.  Bayesian Trigonometric Support Vector Classifier , 2003, Neural Computation.

[10]  Carl Gold,et al.  Model selection for support vector machine classification , 2002, Neurocomputing.

[11]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[12]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[13]  Peter Sollich,et al.  Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities , 2002, Machine Learning.

[14]  M. Opper,et al.  Gaussian Process Classiication and Svm: Mean Field Results and Leave-one-out Estimator , 2007 .