RPR-BP: A Deep Reinforcement Learning Method for Automatic Hyperparameter Optimization

We introduce a new deep reinforcement learning architecture - RPR-BP to optimize hyperparameter for any machine learning model on a given data set. In this method, an agent constructed by a Long Short-Term Memory Network aims at maximizing the expected accuracy of a machine learning model on a validation set. At each iteration, it selects a set of hyperparameters and uses the accuracy of the model on the validation set as the reward signal to update its internal parameters. After multiple iterations, the agent learns how to improve its decisions. However, the computation of the reward requires significant time and leads to low sample efficiency. To speed up training, we employ a neural network to predict the reward. The training process for the agent and the prediction network is divided into three phases: Real-Predictive-Real (RPR). First, the agent and the prediction network are trained by the real experience; then, the agent is trained by the reward generated from the prediction network; finally, the agent is trained again by the real experience. In this way, we can speed up training and make the agent achieve a high accuracy. Besides, to reduce the variance, we propose a Bootstrap Pool (BP) to guide the exploration in the search space. The experiment was carried out by optimizing hyperparameters of two widely used machine learning models: Random Forest and XGBoost. Experimental results show that the proposed method outperforms random search, Bayesian optimization and Tree-structured Parzen Estimator in terms of accuracy, time efficiency and stability.

[1]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[2]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[3]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[4]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[9]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[12]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[13]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[14]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[15]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[16]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.