Hyperparameter Search in Machine Learning

We introduce the hyperparameter search problem in the field of machine learning and discuss its main challenges from an optimization perspective. Machine learning methods attempt to build models that capture some element of interest based on given data. Most common learning algorithms feature a set of hyperparameters that must be determined before training commences. The choice of hyperparameters can significantly affect the resulting model's performance, but determining good values can be complex; hence a disciplined, theoretically sound search strategy is essential.

[1]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[2]  Gavin C. Cawley,et al.  Fast exact leave-one-out cross-validation of sparse least-squares support vector machines , 2004, Neural Networks.

[3]  Johan A. K. Suykens,et al.  Coupled Simulated Annealing , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  João Paulo Papa,et al.  Model selection for Discriminative Restricted Boltzmann Machines through meta-heuristic techniques , 2015, J. Comput. Sci..

[5]  Yves Deville,et al.  DiceKriging, DiceOptim: Two R Packages for the Analysis of Computer Experiments by Kriging-Based Metamodeling and Optimization , 2012 .

[6]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[7]  Kevin Leyton-Brown,et al.  Auto-WEKA: Automated Selection and Hyper-Parameter Optimization of Classification Algorithms , 2012, ArXiv.

[8]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[9]  Johan A. K. Suykens,et al.  A robust ensemble approach to learn from positive and unlabeled data using SVM base models , 2014, Neurocomputing.

[10]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[11]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  X. C. Guo,et al.  A novel LS-SVMs hyper-parameter selection based on particle swarm optimization , 2008, Neurocomputing.

[14]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[15]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[16]  Peter Kulchyski and , 2015 .

[17]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[18]  F. Hutter,et al.  ParamILS: an automatic algorithm configuration framework , 2009 .

[19]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[20]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[21]  Carlos Ansótegui,et al.  A Gender-Based Genetic Algorithm for the Automatic Configuration of Algorithms , 2009, CP.

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[25]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[26]  David D. Cox,et al.  Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[27]  Nuria Oliver,et al.  Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering , 2010, RecSys '10.

[28]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[29]  J. Weston,et al.  Support Vector Machine Solvers , 2007 .

[30]  Johan A. K. Suykens,et al.  EnsembleSVM: a library for ensemble learning using support vector machines , 2014, J. Mach. Learn. Res..

[31]  François Bachoc,et al.  Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification , 2013, Comput. Stat. Data Anal..

[32]  Katharina Eggensperger,et al.  Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters , 2013 .

[33]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[34]  Thomas Stützle,et al.  F-Race and Iterated F-Race: An Overview , 2010, Experimental Methods for the Analysis of Optimization Algorithms.

[35]  Shih-Wei Lin,et al.  Particle swarm optimization for parameter determination and feature selection of support vector machines , 2008, Expert Syst. Appl..

[36]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[37]  Arun D Kulkarni,et al.  Neural Networks for Pattern Recognition , 1991 .

[38]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[39]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[40]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[41]  Bart De Moor,et al.  Easy Hyperparameter Search Using Optunity , 2014, ArXiv.

[42]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[43]  Gisbert Schneider,et al.  Optimized Particle Swarm Optimization (OPSO) and its application to artificial neural network training , 2006, BMC Bioinformatics.

[44]  Thomas Bartz-Beielstein,et al.  Experimental Methods for the Analysis of Optimization Algorithms , 2010 .

[45]  Ruben Martinez-Cantin,et al.  BayesOpt: a Bayesian optimization library for nonlinear optimization, experimental design and bandits , 2014, J. Mach. Learn. Res..

[46]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[47]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .