How to tune the RBF SVM hyperparameters?: An empirical evaluation of 18 search algorithms

SVM with an RBF kernel is usually one of the best classification algorithms for most data sets, but it is important to tune the two hyperparameters $C$ and $\gamma$ to the data itself. In general, the selection of the hyperparameters is a non-convex optimization problem and thus many algorithms have been proposed to solve it, among them: grid search, random search, Bayesian optimization, simulated annealing, particle swarm optimization, Nelder Mead, and others. There have also been proposals to decouple the selection of $\gamma$ and $C$. We empirically compare 18 of these proposed search algorithms (with different parameterizations for a total of 47 combinations) on 115 real-life binary data sets. We find (among other things) that trees of Parzen estimators and particle swarm optimization select better hyperparameters with only a slight increase in computation time with respect to a grid search with the same number of evaluations. We also find that spending too much computational effort searching the hyperparameters will not likely result in better performance for future data and that there are no significant differences among the different procedures to select the best set of hyperparameters when more than one is found by the search algorithms.

[1]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[2]  Thorsten Joachims,et al.  The Maximum-Margin Approach to Learning Text Classifiers , 2001, Künstliche Intell..

[3]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[4]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[5]  Davide Anguita,et al.  Hyperparameter design criteria for support vector classifiers , 2003, Neurocomputing.

[6]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[7]  Marius Lindauer,et al.  Auto-Sklearn 2.0: The Next Generation , 2020, ArXiv.

[8]  B. R. Fink,et al.  Clinical and Experimental Pharmacology and Physiology , 1974 .

[9]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[10]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Ehl Emile Aarts,et al.  Simulated annealing and Boltzmann machines , 2003 .

[13]  N. Schenker,et al.  On Judging the Significance of Differences by Examining the Overlap Between Confidence Intervals , 2001 .

[14]  A. Zeileis Econometric Computing with HC and HAC Covariance Matrix Estimators , 2004 .

[15]  D. Lizotte Practical bayesian optimization , 2008 .

[16]  Zhe Sun,et al.  Cutting Plane Method for Continuously Constrained Kernel-Based Regression , 2010, IEEE Transactions on Neural Networks.

[17]  Jacques Wainer,et al.  Comparison of 14 different families of classification algorithms on 115 binary datasets , 2016, ArXiv.

[18]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[19]  Yang Xiang,et al.  Generalized Simulated Annealing for Global Optimization: The GenSA Package , 2013, R J..

[20]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[21]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[22]  Chen Lei,et al.  Automated Machine Learning , 2021, Cognitive Intelligence and Robotics.

[23]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[24]  Lars Kotthoff,et al.  Automated Machine Learning: Methods, Systems, Challenges , 2019, The Springer Series on Challenges in Machine Learning.

[25]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[26]  Yatong Zhou,et al.  Analysis of the Distance Between Two Classes for Tuning SVM Hyperparameters , 2010, IEEE Transactions on Neural Networks.

[27]  S. Sathiya Keerthi,et al.  An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models , 2006, NIPS.

[28]  Patrick Ruch,et al.  Model Selection for Support Vector Classifiers via Direct Simplex Search , 2005, FLAIRS Conference.

[29]  Mikio L. Braun,et al.  Fast cross-validation via sequential testing , 2012, J. Mach. Learn. Res..

[30]  Tong Zhang,et al.  A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization , 2016, J. Mach. Learn. Res..

[31]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[32]  On the Relation Between the GACV and Joachims’ ξα Method for Tuning Support Vector Machines, With Extensions to the Non-Standard Case , 2001 .

[33]  下田 吉之,et al.  PSO(Particle Swarm Optimization)手法による最適熱源探索 , 2012 .

[34]  John Ludbrook,et al.  Multiple Inferences Using Confidence Intervals , 2000, Clinical and experimental pharmacology & physiology.

[35]  Xiang-Wei Zhu,et al.  mixtox: An R Package for Mixture Toxicity Assessment , 2016, R J..

[36]  David Schmidtz,et al.  What Do We Need , 2006 .

[37]  Zne-Jung Lee,et al.  Parameter determination of support vector machine and feature selection using simulated annealing approach , 2008, Appl. Soft Comput..

[38]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[39]  S. Sathiya Keerthi,et al.  Efficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithms , 2002, IEEE Trans. Neural Networks.

[40]  Gavin C. Cawley,et al.  Preventing Over-Fitting during Model Selection via Bayesian Regularisation of the Hyper-Parameters , 2007, J. Mach. Learn. Res..

[41]  Ping-Feng Pai,et al.  Support Vector Machines with Simulated Annealing Algorithms in Electricity Load Forecasting , 2005 .

[42]  Jacques Wainer,et al.  Empirical Evaluation of Resampling Procedures for Optimising SVM Hyperparameters , 2017, J. Mach. Learn. Res..

[43]  M. Powell The BOBYQA algorithm for bound constrained optimization without derivatives , 2009 .

[44]  Cheng-Lung Huang,et al.  A distributed PSO-SVM hybrid system with feature selection and parameter optimization , 2008, Appl. Soft Comput..

[45]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[46]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[47]  F. Imbault,et al.  A stochastic optimization approach for parameter tuning of support vector machines , 2004, ICPR 2004.

[48]  Yong Zhang,et al.  Uniform Design: Theory and Application , 2000, Technometrics.

[49]  WainerJacques,et al.  Empirical comparison of cross-validation and internal metrics for tuning SVM hyperparameters , 2017 .

[50]  Su-Yun Huang,et al.  Model selection for support vector machines via uniform design , 2007, Comput. Stat. Data Anal..

[51]  Simon Wessing,et al.  Proper initialization is crucial for the Nelder–Mead simplex search , 2018, Optim. Lett..

[52]  Jacques Wainer,et al.  Empirical comparison of cross-validation and internal metrics for tuning SVM hyperparameters , 2017, Pattern Recognit. Lett..

[53]  Aaron Klein,et al.  RoBO : A Flexible and Robust Bayesian Optimization Framework in Python , 2017 .

[54]  Shih-Wei Lin,et al.  Particle swarm optimization for parameter determination and feature selection of support vector machines , 2008, Expert Syst. Appl..

[55]  K. Lebart,et al.  A stochastic optimization approach for parameter tuning of support vector machines , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[56]  Bernd Bischl,et al.  Effectiveness of Random Search in SVM hyper-parameter tuning , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[57]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[58]  D. Goldberg,et al.  BOA: the Bayesian optimization algorithm , 1999 .