A nested heuristic for parameter tuning in Support Vector Machines

The default approach for tuning the parameters of a Support Vector Machine (SVM) is a grid search in the parameter space. Different metaheuristics have been recently proposed as a more efficient alternative, but they have only shown to be useful in models with a low number of parameters. Complex models, involving many parameters, can be seen as extensions of simpler and easy-to-tune models, yielding a nested sequence of models of increasing complexity. In this paper we propose an algorithm which successfully exploits this nested property, with two main advantages versus the state of the art. First, our framework is general enough to allow one to address, with the very same method, several popular SVM parameter models encountered in the literature. Second, as algorithmic requirements we only need either an SVM library or any routine for the minimization of convex quadratic functions under linear constraints. In the computational study, we address Multiple Kernel Learning tuning problems for which grid search clearly would be infeasible, while our classification accuracy is comparable to that of ad hoc model-dependent benchmark tuning methods.

[1]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[2]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[3]  Nenad Mladenovic,et al.  A continuous variable neighborhood search heuristic for finding the three-dimensional structure of a molecule , 2008, Eur. J. Oper. Res..

[4]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[5]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[6]  LocatelliMarco,et al.  Efficient Algorithms for Large Scale Global Optimization , 2003 .

[7]  Ronny Luss,et al.  Mathematical programming for statistical learning with applications in biology and finance , 2009 .

[8]  Pierre Hansen,et al.  Improvement and Comparison of Heuristics for Solving the Uncapacitated Multisource Weber Problem , 2000, Oper. Res..

[9]  V. Vapnik,et al.  Bounds on Error Expectation for Support Vector Machines , 2000, Neural Computation.

[10]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[11]  S. Sathiya Keerthi,et al.  Evaluation of simple performance measures for tuning SVM hyperparameters , 2003, Neurocomputing.

[12]  A. Zell,et al.  Efficient parameter selection for support vector machines in classification and regression via model-based global optimization , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Mirjana Cangalovic,et al.  Solving spread spectrum radar polyphase code design problem by tabu search and variable neighbourhood search , 2003, Eur. J. Oper. Res..

[15]  Nenad Mladenović,et al.  A Variable Neighbourhood Algorithm for Solving the Continuous Location-Allocation Problem , 1995 .

[16]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[17]  Emilio Carrizosa,et al.  Supervised classification and mathematical optimization , 2013, Comput. Oper. Res..

[18]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[19]  Carl Gold,et al.  Model selection for support vector machine classification , 2002, Neurocomputing.

[20]  Alexandre d'Aspremont,et al.  Support vector machine classification with indefinite kernels , 2007, Math. Program. Comput..

[21]  Fabio Schoen,et al.  Fast Global Optimization of Difficult Lennard-Jones Clusters , 2002, Comput. Optim. Appl..

[22]  Javier M. Moguerza,et al.  Methods for the combination of kernel matrices within a support vector framework , 2009, Machine Learning.

[23]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[24]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Evolutionary tuning of SVM parameter values in multiclass problems , 2008, Neurocomputing.

[25]  Nenad Mladenović,et al.  GLOB — A new VNS-based Software for Global Optimization , 2006 .

[26]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[27]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[28]  Thomas P. Trappenberg,et al.  A Heuristic for Free Parameter Optimization with Support Vector Machines , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[29]  C. C. Lau,et al.  An Integrated Approach of Support Vector Machine and Variable Neighborhood Search for Discovering Combinational Gene Signatures in Predicting Chemo-response of Osteosarcoma , 2008 .

[30]  Kristin P. Bennett,et al.  A Pattern Search Method for Model Selection of Support Vector Regression , 2002, SDM.

[31]  Christopher K. I. Williams Learning Kernel Classifiers , 2003 .

[32]  Pierre Hansen,et al.  Variable Neighborhood Search , 2018, Handbook of Heuristics.

[33]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[34]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[35]  Christian Igel,et al.  Evolutionary tuning of multiple SVM parameters , 2005, ESANN.

[36]  Andrea Grosso,et al.  Solving molecular distance geometry problems by global optimization algorithms , 2009, Comput. Optim. Appl..

[37]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[38]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .

[39]  Pierre Hansen,et al.  Variable neighborhood search , 1997, Eur. J. Oper. Res..

[40]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[41]  J. Paul Brooks,et al.  Support Vector Machines with the Ramp Loss and the Hard Margin Loss , 2011, Oper. Res..

[42]  Mirjana Cangalovic,et al.  General variable neighborhood search for the continuous optimization , 2006, Eur. J. Oper. Res..

[43]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[44]  Fabio Schoen,et al.  Efficient Algorithms for Large Scale Global Optimization: Lennard-Jones Clusters , 2003, Comput. Optim. Appl..

[45]  Paul Davidsson,et al.  Quantifying the Impact of Learning Algorithm Parameter Tuning , 2006, AAAI.

[46]  Nelson Maculan,et al.  A Function to Test Methods Applied to Global Minimization of Potential Energy of Molecules , 2004, Numerical Algorithms.

[47]  Pierre Hansen,et al.  Variable neighborhood search: Principles and applications , 1998, Eur. J. Oper. Res..

[48]  Ralf Herbrich,et al.  Learning Kernel Classifiers: Theory and Algorithms , 2001 .

[49]  Sheng-De Wang,et al.  Choosing the kernel parameters for support vector machines by the inter-cluster distance in the feature space , 2009, Pattern Recognit..

[50]  Xiaoli Zhang,et al.  An ACO-based algorithm for parameter optimization of support vector machines , 2010, Expert Syst. Appl..

[51]  José R. Dorronsoro,et al.  Finding optimal model parameters by deterministic and annealed focused grid search , 2009, Neurocomputing.

[52]  Nenad Mladenovic,et al.  Gaussian variable neighborhood search for continuous optimization , 2011, Comput. Oper. Res..

[53]  Antoine Geissbühler,et al.  Model Selection for Support Vector Classifiers via Genetic Algorithms. An Application to Medical Decision Support , 2004, ISBMDA.