论文信息 - Sample-Efficient Automated Deep Reinforcement Learning

Sample-Efficient Automated Deep Reinforcement Learning

Despite significant progress in challenging problems across various domains, applying state-of-the-art deep reinforcement learning (RL) algorithms remains challenging due to their sensitivity to the choice of hyperparameters. This sensitivity can partly be attributed to the non-stationarity of the RL problem, potentially requiring different hyperparameter settings at various stages of the learning process. Additionally, in the RL setting, hyperparameter optimization (HPO) requires a large number of environment interactions, hindering the transfer of the successes in RL to real-world applications. In this work, we tackle the issues of sample-efficient and dynamic HPO in RL. We propose a population-based automated RL (AutoRL) framework to meta-optimize arbitrary off-policy RL algorithms. In this framework, we optimize the hyperparameters and also the neural architecture while simultaneously training the agent. By sharing the collected experience across the population, we substantially increase the sample efficiency of the meta-optimization. We demonstrate the capabilities of our sample-efficient AutoRL approach in a case study with the popular TD3 algorithm in the MuJoCo benchmark suite, where we reduce the number of environment interactions needed for meta-optimization by up to an order of magnitude compared to population-based training.

Frank Hutter | Andr'e Biedenkapp | Gregor Kohler | Jorg K.H. Franke

[1] Aleksandra Faust,et al. Learning Navigation Behaviors End-to-End With AutoRL , 2018, IEEE Robotics and Automation Letters.

[2] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[3] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[4] Dario Floreano,et al. Neuroevolution: from architectures to learning , 2008, Evol. Intell..

[5] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.

[6] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[7] Sae-Young Chung,et al. Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update , 2018, NeurIPS.

[8] Kagan Tumer,et al. Collaborative Evolutionary Reinforcement Learning , 2019, ICML.

[9] Pietro Lio',et al. Proximal Distilled Evolutionary Reinforcement Learning , 2019, AAAI.

[10] Peter Henderson,et al. Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control , 2017, ArXiv.

[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13] Michel Tokic,et al. Adaptive epsilon-Greedy Exploration in Reinforcement Learning Based on Value Difference , 2010, KI.

[14] Günther Palm,et al. Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax , 2011, KI.

[15] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[16] Shimon Whiteson,et al. Fast Efficient Hyperparameter Tuning for Policy Gradient Methods , 2019, NeurIPS.

[17] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.

[18] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[19] Kenneth O. Stanley,et al. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[20] Brian J. Ross,et al. A Lamarckian Evolution Strategy for Genetic Algorithms , 1998, Practical Handbook of Genetic Algorithms.

[21] David E. Goldberg,et al. Genetic Algorithms, Tournament Selection, and the Effects of Noise , 1995, Complex Syst..

[22] Elliot Meyerson,et al. Evolving Deep Neural Networks , 2017, Artificial Intelligence in the Age of Neural Networks and Brain Computing.

[23] Kenneth O. Stanley,et al. A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[24] Damien Ernst,et al. How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies , 2015, ArXiv.