A Strategy for Ranking Optimization Methods using Multiple Criteria

An important component of a suitably automated machine learning process is the automation of the model selection which often contains some optimal selection of hyperparameters. The hyperparameter optimization process is often conducted with a black-box tool, but, because different tools may perform better in different circumstances, automating the machine learning workflow might involve choosing the appropriate optimization method for a given situation. This paper proposes a mechanism for comparing the performance of multiple optimization methods for multiple performance metrics across a range of optimization problems. Using nonparametric statistical tests to convert the metrics recorded for each problem into a partial ranking of optimization methods, results from each problem are then amalgamated through a voting mechanism to generate a final score for each optimization method. Mathematical analysis is provided to motivate decisions within this strategy, and sample results are provided to demonstrate the impact of certain design decisions.

[1]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[2]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[3]  P. Fishburn Axioms for Lexicographic Preferences , 1975 .

[4]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[5]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[6]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[7]  David D. Cox,et al.  Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[8]  Jorge J. Moré,et al.  Digital Object Identifier (DOI) 10.1007/s101070100263 , 2001 .

[9]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[10]  E. Polak,et al.  On Multicriteria Optimization , 1976 .

[11]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[12]  M. Fay,et al.  Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. , 2010, Statistics surveys.

[13]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Bayesian Optimization with Unknown Constraints , 2015, ICML.

[14]  Nando de Freitas,et al.  A Bayesian interactive optimization approach to procedural animation design , 2010, SCA '10.

[15]  Boaz Golany,et al.  Creating a consensus ranking of proposals from reviewers' partial ordinal rankings , 2007, Comput. Oper. Res..

[16]  N. Zheng,et al.  Global Optimization of Stochastic Black-Box Systems via Sequential Kriging Meta-Models , 2006, J. Glob. Optim..

[17]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[18]  Nando de Freitas,et al.  Active Policy Learning for Robot Planning and Exploration under Uncertainty , 2007, Robotics: Science and Systems.

[19]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[20]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[21]  Yu. K. Mashunin,et al.  Vector Optimization , 2017, Encyclopedia of Machine Learning and Data Mining.

[22]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[23]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[24]  Katharina Eggensperger,et al.  Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters , 2013 .

[25]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[26]  I. Shevtsova,et al.  On the Upper Bound for the Absolute Constant in the Berry–Esseen Inequality , 2010 .

[27]  Neil D. Lawrence,et al.  GLASSES: Relieving The Myopia Of Bayesian Optimisation , 2015, AISTATS.

[28]  Jasper Snoek,et al.  Input Warping for Bayesian Optimization of Non-Stationary Functions , 2014, ICML.

[29]  Matthew W. Hoffman Modular mechanisms for Bayesian optimization , 2014 .

[30]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[31]  Chris P. Tsokos,et al.  Mathematical Statistics with Applications , 2009 .

[32]  Kevin Leyton-Brown,et al.  Efficient Benchmarking of Hyperparameter Optimizers via Surrogates , 2015, AAAI.

[33]  Ruben Martinez-Cantin,et al.  BayesOpt: a Bayesian optimization library for nonlinear optimization, experimental design and bandits , 2014, J. Mach. Learn. Res..