Derivative-Free Optimization with Adaptive Experience for Efficient Hyper-Parameter Tuning

Hyper-parameter tuning is a core part of automatic machine learning (AutoML), which aims to automatically configure machine learning systems in deployed applications. Previously, hyperparameter tuning is usually formulated as a black-box optimization problem, for which derivative-free optimization (DFO) solver is often employed. Such solvers often suffered from low-efficiency. Thus experienced DFO was proposed, which utilizes historical optimization process data to guide the optimization on new problems. However, the effectiveness of experienced DFO is sensitive to the relevance between the experienced tasks and the target tasks. Relevant experience can accelerate the convergence, while irrelevant experience could injure the convergence. This paper proposes an adaptation mechanism for the experienced DFO. It learns a set of experience models to guide the DFO processes, and exams these models on a few labeled samples from the target task. By comparing model predictions with the ground-truth labels, it adaptively learns the relevant experience by weighting those models. The experiments on synthetic tasks verify that the proposed method can effectively adopt the relevant experience for a range of target tasks. Furthermore, we apply the proposed method to the tasks of configuring LightGBM hyperparameters. The empirical results show that the proposed method effectively selects the relevant experience and significantly improves the performance of hyper-parameter tuning in only a few iterations.

[1]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[2]  Mohamed Cheriet,et al.  Model selection for the LS-SVM. Application to handwriting recognition , 2009, Pattern Recognit..

[3]  Andrei Z. Broder,et al.  Computational advertising and recommender systems , 2008, RecSys '08.

[4]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[5]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[6]  Zhi-Hua Zhou,et al.  Handling concept drift via model reuse , 2018, Machine Learning.

[7]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[8]  Zhi-Hua Zhou,et al.  Experienced Optimization with Reusable Directional Model for Hyper-Parameter Search , 2018, IJCAI.

[9]  Alain Biem,et al.  A model selection criterion for classification: application to HMM topology optimization , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[10]  Zhi-Hua Zhou,et al.  Heterogeneous Model Reuse via Optimizing Multiparty Multiclass Margin , 2019, ICML.

[11]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[12]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[13]  Yang Yu,et al.  Pareto Ensemble Pruning , 2015, AAAI.

[14]  Yang Yu,et al.  Sequential Classification-Based Optimization for Direct Policy Search , 2017, AAAI.

[15]  Carlos Soares,et al.  Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results , 2003, Machine Learning.

[16]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  X. C. Guo,et al.  A novel LS-SVMs hyper-parameter selection based on particle swarm optimization , 2008, Neurocomputing.

[18]  Yang Yu,et al.  Derivative-Free Optimization via Classification , 2016, AAAI.

[19]  Marius Lindauer,et al.  Warmstarting of Model-based Algorithm Configuration , 2017, AAAI.

[20]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[21]  Yoshua Bengio,et al.  Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[22]  Gary Bradski,et al.  Computer Vision Face Tracking For Use in a Perceptual User Interface , 1998 .

[23]  Isabelle Guyon,et al.  Taking Human out of Learning Applications: A Survey on Automated Machine Learning , 2018, 1810.13306.

[24]  Qiang Yang,et al.  Multi-Fidelity Automatic Hyper-Parameter Tuning via Transfer Series Expansion , 2019, AAAI.

[25]  Xin Yao,et al.  On the approximation ability of evolutionary optimization with application to minimum set cover , 2010, Artif. Intell..

[26]  David B. Fogel,et al.  An introduction to simulated evolutionary optimization , 1994, IEEE Trans. Neural Networks.

[27]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[28]  Rémi Munos,et al.  Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.

[29]  Yi-Qi Hu,et al.  Cascaded Algorithm-Selection and Hyper-Parameter Optimization with Extreme-Region Upper Confidence Bound Bandit , 2019, IJCAI.

[30]  Dimitris Kanellopoulos,et al.  Data Preprocessing for Supervised Leaning , 2007 .

[31]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[32]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[33]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[34]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[35]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[36]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.