论文信息 - Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

The combination of Reinforcement Learning (RL) with deep learning has led to a series of impressive feats, with many believing (deep) RL provides a path towards generally capable agents. However, the success of RL agents is often highly sensitive to design choices in the training process, which may require tedious and error-prone manual tuning. This makes it challenging to use RL for new problems, while also limits its full potential. In many other areas of machine learning, AutoML has shown it is possible to automate such design choices and has also yielded promising initial results when applied to RL. However, Automated Reinforcement Learning (AutoRL) involves not only standard applications of AutoML but also includes additional challenges unique to RL, that naturally produce a different set of methods. As such, AutoRL has been emerging as an important area of research in RL, providing promise in a variety of applications from RNA design to playing games such as Go. Given the diversity of methods and environments considered in RL, much of the research has been conducted in distinct subfields, ranging from meta-learning to evolution. In this survey we seek to unify the field of AutoRL, we provide a common taxonomy, discuss each area in detail and pose open problems which would be of interest to researchers going forward.

[1] Ingo Rechenberg,et al. Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[2] Jonas Mockus,et al. On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[3] Richard S. Sutton,et al. Goal Seeking Components for Adaptive Intelligence: An Initial Assessment. , 1981 .

[4] R. Geoff Dromey,et al. An algorithm for the selection problem , 1986, Softw. Pract. Exp..

[5] Manfred Morari,et al. Model predictive control: Theory and practice , 1988 .

[6] Manfred Morari,et al. Model predictive control: Theory and practice - A survey , 1989, Autom..

[7] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[8] L. Darrell Whitley,et al. Lamarckian Evolution, The Baldwin Effect and Function Optimization , 1994, PPSN.

[9] William M. Spears,et al. Adapting Crossover in Evolutionary Algorithms , 1995, Evolutionary Programming.

[10] Zbigniew Michalewicz,et al. Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[11] Thomas Bäck,et al. Parallel Problem Solving from Nature — PPSN V , 1998, Lecture Notes in Computer Science.

[12] Thomas Bäck,et al. An Overview of Parameter Control Methods by Self-Adaption in Evolutionary Algorithms , 1998, Fundam. Informaticae.

[13] Donald R. Jones,et al. Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[14] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[15] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[16] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[17] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[18] Michael Kearns,et al. Bias-Variance Error Bounds for Temporal Difference Updates , 2000, COLT.

[19] Michail G. Lagoudakis,et al. Reinforcement Learning for Algorithm Selection , 2000, AAAI/IAAI.

[20] Risto Miikkulainen,et al. Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[21] Kenji Doya,et al. Evolution of meta-parameters in reinforcement learning algorithm , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[22] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[23] Bartlomiej Gloger,et al. Self-adaptive Evolutionary Algorithms , 2004 .

[24] Peter Dayan,et al. Analytical Mean Squared Error Curves for Temporal Difference Learning , 1996, Machine Learning.

[25] Andrew G. Barto,et al. Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[26] Risto Miikkulainen,et al. Efficient Non-linear Control Through Neuroevolution , 2006, ECML.

[27] Bhaskara Marthi,et al. Automatic shaping and decomposition of reward functions , 2007, ICML '07.

[28] Charles Ofria,et al. Natural Selection Fails to Optimize Mutation Rates for Long-Term Adaptation on Rugged Fitness Landscapes , 2008, ECAL.

[29] Shimon Whiteson,et al. Generalized Domains for Empirical Evaluations in Reinforcement Learning , 2009 .

[30] F. Hutter,et al. ParamILS: An Automatic Algorithm Configuration Framework , 2009, J. Artif. Intell. Res..

[31] Charles Ofria,et al. Evolving coordinated quadruped gaits with the HyperNEAT generative encoding , 2009, 2009 IEEE Congress on Evolutionary Computation.

[32] Kenneth O. Stanley,et al. A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[33] Scott Sanner,et al. Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda , 2010, ICML.

[34] Shimon Whiteson,et al. Multi-task evolutionary shaping without pre-specified representations , 2010, GECCO '10.

[35] Yuri Malitsky,et al. ISAC - Instance-Specific Algorithm Configuration , 2010, ECAI.

[36] Nando de Freitas,et al. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[37] Kevin Leyton-Brown,et al. Hydra: Automatically Configuring Algorithms for Portfolio-Based Selection , 2010, AAAI.

[38] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[39] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[40] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[41] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[42] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[43] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[44] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[45] Sebastian Risi,et al. Confronting the challenge of learning a flexible neural controller for a diversity of morphologies , 2013, GECCO '13.

[46] Kyrre Glette,et al. Evolving Gaits for Physical Robots with the HyperNEAT Generative Encoding: The Benefits of Simulation , 2013, EvoApplications.

[47] Juan José Murillo-Fuentes,et al. Gaussian Processes for Nonlinear Signal Processing: An Overview of Recent Advances , 2013, IEEE Signal Processing Magazine.

[48] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[49] Jonathan P. How,et al. Reinforcement learning with multi-fidelity simulators , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[50] Kevin Leyton-Brown,et al. An Efficient Approach for Assessing Hyperparameter Importance , 2014, ICML.

[51] Risto Miikkulainen,et al. A Neuroevolution Approach to General Atari Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[52] Peter I. Frazier,et al. Bayesian optimization for materials design , 2015, 1506.01349.

[53] Shiguang Shan,et al. Self-Paced Curriculum Learning , 2015, AAAI.

[54] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.