Model Selection via Bilevel Optimization

A key step in many statistical learning methods used in machine learning involves solving a convex optimization problem containing one or more hyper-parameters that must be selected by the users. While cross validation is a commonly employed and widely accepted method for selecting these parameters, its implementation by a grid-search procedure in the parameter space effectively limits the desirable number of hyper-parameters in a model, due to the combinatorial explosion of grid points in high dimensions. This paper proposes a novel bilevel optimization approach to cross validation that provides a systematic search of the hyper-parameters. The bilevel approach enables the use of the state-of-the-art optimization methods and their well-supported softwares. After introducing the bilevel programming approach, we discuss computational methods for solving a bilevel cross-validation program, and present numerical results to substantiate the viability of this novel approach as a promising computational tool for model selection in machine learning.

[1]  S. Sathiya Keerthi,et al.  Evaluation of simple performance measures for tuning SVM hyperparameters , 2003, Neurocomputing.

[2]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[3]  Jerome Bracken,et al.  Mathematical Programs with Optimization Problems in the Constraints , 1973, Oper. Res..

[4]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[5]  Sven Leyffer,et al.  User manual for filterSQP , 1998 .

[6]  Jinbo Bi,et al.  Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[7]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory, Second Edition , 2000, Statistics for Engineering and Information Science.

[8]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[9]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[10]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[11]  Michael A. Saunders,et al.  User’s Guide For Snopt Version 6, A Fortran Package for Large-Scale Nonlinear Programming∗ , 2002 .

[12]  Stephan Dempe,et al.  Foundations of Bilevel Programming , 2002 .

[13]  Bethany L. Nicholson,et al.  Mathematical Programs with Equilibrium Constraints , 2021, Pyomo — Optimization Modeling in Python.

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[16]  Michal Kočvara,et al.  Nonsmooth approach to optimization problems with equilibrium constraints : theory, applications, and numerical results , 1998 .

[17]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[18]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[19]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[21]  S. Dempe Annotated Bibliography on Bilevel Programming and Mathematical Programs with Equilibrium Constraints , 2003 .

[22]  G. Golub,et al.  Good Ridge Parameter , 1979 .

[23]  Colin Campbell,et al.  A Linear Programming Approach to Novelty Detection , 2000, NIPS.

[24]  Kristin P. Bennett,et al.  A Pattern Search Method for Model Selection of Support Vector Regression , 2002, SDM.

[25]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[26]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[27]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[28]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[29]  Sven Leyffer,et al.  Nonlinear programming without a penalty function , 2002, Math. Program..