Frugal Optimization for Cost-related Hyperparameters

The increasing demand for democratizing machine learning algorithms calls for hyperparameter optimization (HPO) solutions at low cost. Many machine learning algorithms have hyperparameters which can cause a large variation in the training cost. But this effect is largely ignored in existing HPO methods, which are incapable to properly control cost during the optimization process. To address this problem, we develop a new cost-frugal HPO solution. The core of our solution is a simple but new randomized direct-search method, for which we provide theoretical guarantees on the convergence rate and the total cost incurred to achieve convergence. We provide strong empirical results in comparison with state-of-the-art HPO methods on large AutoML benchmarks.

[1]  Fabian Pedregosa,et al.  Hyperparameter optimization with approximate gradient , 2016, ICML.

[2]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[3]  Kirthevasan Kandasamy,et al.  Multi-fidelity Bayesian Optimisation with Continuous Approximations , 2017, ICML.

[4]  Bernd Bischl,et al.  An Open Source AutoML Benchmark , 2019, ArXiv.

[5]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[6]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[7]  Randal S. Olson,et al.  PMLB: a large benchmark suite for machine learning evaluation and comparison , 2017, BioData Mining.

[8]  András György,et al.  Efficient Multi-Start Strategies for Local Search Algorithms , 2009, J. Artif. Intell. Res..

[9]  I. Konnov Conditional Gradient Method Without Line-Search , 2018 .

[10]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[11]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[12]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[13]  Fei Sha,et al.  Hyper-parameter Tuning under a Budget Constraint , 2019, IJCAI.

[14]  Tamara G. Kolda,et al.  Optimization by Direct Search: New Perspectives on Some Classical and Modern Methods , 2003, SIAM Rev..

[15]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[16]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[17]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[18]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[19]  Cost Effective Optimization for Cost-related Hyperparameters , 2020, ArXiv.

[20]  Zelda B. Zabinsky,et al.  Stopping and restarting strategy for stochastic sequential search in global optimization , 2010, J. Glob. Optim..

[21]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[22]  J. M. Moreno-Vega,et al.  Advanced Multi-start Methods , 2010 .

[23]  Andrew McCallum,et al.  Energy and Policy Considerations for Modern Deep Learning Research , 2020, AAAI.