Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

The combination of Reinforcement Learning (RL) with deep learning has led to a series of impressive feats, with many believing (deep) RL provides a path towards generally capable agents. However, the success of RL agents is often highly sensitive to design choices in the training process, which may require tedious and error-prone manual tuning. This makes it challenging to use RL for new problems, while also limits its full potential. In many other areas of machine learning, AutoML has shown it is possible to automate such design choices and has also yielded promising initial results when applied to RL. However, Automated Reinforcement Learning (AutoRL) involves not only standard applications of AutoML but also includes additional challenges unique to RL, that naturally produce a different set of methods. As such, AutoRL has been emerging as an important area of research in RL, providing promise in a variety of applications from RNA design to playing games such as Go. Given the diversity of methods and environments considered in RL, much of the research has been conducted in distinct subfields, ranging from meta-learning to evolution. In this survey we seek to unify the field of AutoRL, we provide a common taxonomy, discuss each area in detail and pose open problems which would be of interest to researchers going forward.

[1]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[2]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[3]  Richard S. Sutton,et al.  Goal Seeking Components for Adaptive Intelligence: An Initial Assessment. , 1981 .

[4]  R. Geoff Dromey,et al.  An algorithm for the selection problem , 1986, Softw. Pract. Exp..

[5]  Manfred Morari,et al.  Model predictive control: Theory and practice , 1988 .

[6]  Manfred Morari,et al.  Model predictive control: Theory and practice - A survey , 1989, Autom..

[7]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[8]  L. Darrell Whitley,et al.  Lamarckian Evolution, The Baldwin Effect and Function Optimization , 1994, PPSN.

[9]  William M. Spears,et al.  Adapting Crossover in Evolutionary Algorithms , 1995, Evolutionary Programming.

[10]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[11]  Thomas Bäck,et al.  Parallel Problem Solving from Nature — PPSN V , 1998, Lecture Notes in Computer Science.

[12]  Thomas Bäck,et al.  An Overview of Parameter Control Methods by Self-Adaption in Evolutionary Algorithms , 1998, Fundam. Informaticae.

[13]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[14]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[15]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[16]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[17]  Doina Precup,et al.  Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[18]  Michael Kearns,et al.  Bias-Variance Error Bounds for Temporal Difference Updates , 2000, COLT.

[19]  Michail G. Lagoudakis,et al.  Reinforcement Learning for Algorithm Selection , 2000, AAAI/IAAI.

[20]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[21]  Kenji Doya,et al.  Evolution of meta-parameters in reinforcement learning algorithm , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[22]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[23]  Bartlomiej Gloger,et al.  Self-adaptive Evolutionary Algorithms , 2004 .

[24]  Peter Dayan,et al.  Analytical Mean Squared Error Curves for Temporal Difference Learning , 1996, Machine Learning.

[25]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[26]  Risto Miikkulainen,et al.  Efficient Non-linear Control Through Neuroevolution , 2006, ECML.

[27]  Bhaskara Marthi,et al.  Automatic shaping and decomposition of reward functions , 2007, ICML '07.

[28]  Charles Ofria,et al.  Natural Selection Fails to Optimize Mutation Rates for Long-Term Adaptation on Rugged Fitness Landscapes , 2008, ECAL.

[29]  Shimon Whiteson,et al.  Generalized Domains for Empirical Evaluations in Reinforcement Learning , 2009 .

[30]  F. Hutter,et al.  ParamILS: An Automatic Algorithm Configuration Framework , 2009, J. Artif. Intell. Res..

[31]  Charles Ofria,et al.  Evolving coordinated quadruped gaits with the HyperNEAT generative encoding , 2009, 2009 IEEE Congress on Evolutionary Computation.

[32]  Kenneth O. Stanley,et al.  A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[33]  Scott Sanner,et al.  Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda , 2010, ICML.

[34]  Shimon Whiteson,et al.  Multi-task evolutionary shaping without pre-specified representations , 2010, GECCO '10.

[35]  Yuri Malitsky,et al.  ISAC - Instance-Specific Algorithm Configuration , 2010, ECAI.

[36]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[37]  Kevin Leyton-Brown,et al.  Hydra: Automatically Configuring Algorithms for Portfolio-Based Selection , 2010, AAAI.

[38]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[39]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[40]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[41]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[42]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[43]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[44]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[45]  Sebastian Risi,et al.  Confronting the challenge of learning a flexible neural controller for a diversity of morphologies , 2013, GECCO '13.

[46]  Kyrre Glette,et al.  Evolving Gaits for Physical Robots with the HyperNEAT Generative Encoding: The Benefits of Simulation , 2013, EvoApplications.

[47]  Juan José Murillo-Fuentes,et al.  Gaussian Processes for Nonlinear Signal Processing: An Overview of Recent Advances , 2013, IEEE Signal Processing Magazine.

[48]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[49]  Jonathan P. How,et al.  Reinforcement learning with multi-fidelity simulators , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Kevin Leyton-Brown,et al.  An Efficient Approach for Assessing Hyperparameter Importance , 2014, ICML.

[51]  Risto Miikkulainen,et al.  A Neuroevolution Approach to General Atari Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[52]  Peter I. Frazier,et al.  Bayesian optimization for materials design , 2015, 1506.01349.

[53]  Shiguang Shan,et al.  Self-Paced Curriculum Learning , 2015, AAAI.

[54]  Ryota Tomioka,et al.  In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.

[55]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[56]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[57]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[58]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[59]  Kirthevasan Kandasamy,et al.  Gaussian Process Bandit Optimisation with Multi-fidelity Evaluations , 2016, NIPS.

[60]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[62]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[63]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[64]  Martha White,et al.  A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning , 2016, AAMAS.

[65]  Andrew Lewis,et al.  The Whale Optimization Algorithm , 2016, Adv. Eng. Softw..

[66]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[67]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[68]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[69]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[70]  Ameet Talwalkar,et al.  Non-stochastic Best Arm Identification and Hyperparameter Optimization , 2015, AISTATS.

[71]  Holger H. Hoos,et al.  Analysing differences between algorithm configurations through ablation , 2015, Journal of Heuristics.

[72]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[73]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[74]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[75]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[76]  Balaraman Ravindran,et al.  Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning , 2017, ICLR.

[77]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[78]  Behnam Neyshabur,et al.  Implicit Regularization in Deep Learning , 2017, ArXiv.

[79]  Alán Aspuru-Guzik,et al.  Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space , 2017, ICML.

[80]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[81]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[82]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[83]  Misha Denil,et al.  Learning to Learn without Gradient Descent by Gradient Descent , 2016, ICML.

[84]  Kenneth O. Stanley,et al.  Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[85]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[86]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[87]  Peter Henderson,et al.  Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control , 2017, ArXiv.

[88]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[89]  Marius Thomas Lindauer,et al.  Efficient Parameter Importance Analysis via Ablation with Surrogates , 2017, AAAI.

[90]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[91]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[92]  Frank Hutter,et al.  Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari , 2018, IJCAI.

[93]  Marius Lindauer,et al.  CAVE: Configuration Assessment, Visualization and Evaluation , 2018, LION.

[94]  David Silver,et al.  Meta-Gradient Reinforcement Learning , 2018, NeurIPS.

[95]  Nando de Freitas,et al.  Bayesian Optimization in AlphaGo , 2018, ArXiv.

[96]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[97]  Satinder Singh,et al.  On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.

[98]  Simon M. Lucas,et al.  Evolving mario levels in the latent space of a deep convolutional generative adversarial network , 2018, GECCO.

[99]  Samy Bengio,et al.  A Study on Overfitting in Deep Reinforcement Learning , 2018, ArXiv.

[100]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[101]  Shane Legg,et al.  Noisy Networks for Exploration , 2017, ICLR.

[102]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[103]  Shie Mannor,et al.  Learning Robust Options , 2018, AAAI.

[104]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[105]  Jakob N. Foerster,et al.  Deep multi-agent reinforcement learning , 2018 .

[106]  Ian Gibson,et al.  Accelerating Experimental Design by Incorporating Experimenter Hunches , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[107]  Benjamin Doerr,et al.  Theory of Parameter Control for Discrete Black-Box Optimization: Provable Performance Gains Through Dynamic Parameter Choices , 2018, Theory of Evolutionary Computation.

[108]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[109]  Kenneth O. Stanley,et al.  ES is more than just a traditional finite-difference approximator , 2017, GECCO.

[110]  Ilya Kostrikov,et al.  Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[111]  Wouter Caarls,et al.  Parameters Tuning and Optimization for Reinforcement Learning Algorithms Using Evolutionary Computing , 2018, 2018 International Conference on Information Systems and Computer Science (INCISCOS).

[112]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[113]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[114]  Jan N. van Rijn,et al.  Hyperparameter Importance Across Datasets , 2017, KDD.

[115]  Pieter Abbeel,et al.  Evolved Policy Gradients , 2018, NeurIPS.

[116]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[117]  Andrew Zisserman,et al.  Kickstarting Deep Reinforcement Learning , 2018, ArXiv.

[118]  S. Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[119]  Ruosong Wang,et al.  On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.

[120]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[121]  Carlos Riquelme,et al.  Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates , 2019, NeurIPS.

[122]  Adam Gaier,et al.  Weight Agnostic Neural Networks , 2019, NeurIPS.

[123]  Adarsh Sehgal,et al.  Deep Reinforcement Learning Using Genetic Algorithm for Parameter Optimization , 2019, 2019 Third IEEE International Conference on Robotic Computing (IRC).

[124]  Dong Yan,et al.  Reward Shaping via Meta-Learning , 2019, ArXiv.

[125]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[126]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[127]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[128]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[129]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[130]  Frank Hutter,et al.  Learning to Design RNA , 2018, ICLR.

[131]  Tom Schaul,et al.  Adapting Behaviour for Learning Progress , 2019, ArXiv.

[132]  Richard L. Lewis,et al.  Discovery of Useful Questions as Auxiliary Tasks , 2019, NeurIPS.

[133]  Aleksandra Faust,et al.  Learning Navigation Behaviors End-to-End With AutoRL , 2018, IEEE Robotics and Automation Letters.

[134]  Risto Miikkulainen,et al.  Designing neural networks through neuroevolution , 2019, Nat. Mach. Intell..

[135]  Xingyou Song,et al.  An Empirical Study on Hyperparameters and their Interdependence for RL Generalization , 2019, ArXiv.

[136]  Marius Lindauer,et al.  Pitfalls and Best Practices in Algorithm Configuration , 2017, J. Artif. Intell. Res..

[137]  Shimon Whiteson,et al.  Fast Efficient Hyperparameter Tuning for Policy Gradients , 2019, NeurIPS.

[138]  Marius Lindauer,et al.  Best Practices for Scientific Research on Neural Architecture Search , 2019, ArXiv.

[139]  Nasser Mozayani,et al.  Automatic construction and evaluation of macro-actions in reinforcement learning , 2019, Appl. Soft Comput..

[140]  Guy Lever,et al.  Emergent Coordination Through Competition , 2019, ICLR.

[141]  Katja Hofmann,et al.  The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors , 2019, ArXiv.

[142]  Sergey Levine,et al.  When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.

[143]  Peter Stone,et al.  Building Self-Play Curricula Online by Playing with Expert Agents in Adversarial Games , 2019, 2019 8th Brazilian Conference on Intelligent Systems (BRACIS).

[144]  Razvan Pascanu,et al.  Deep reinforcement learning with relational inductive biases , 2018, ICLR.

[145]  Kenneth O. Stanley,et al.  POET: open-ended coevolution of environments and their optimized solutions , 2019, GECCO.

[146]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[147]  David Silver,et al.  On Inductive Biases in Deep Reinforcement Learning , 2019, ArXiv.

[148]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[149]  Anthony G. Francis,et al.  Evolving Rewards to Automate Reinforcement Learning , 2019, ArXiv.

[150]  Frank Hutter,et al.  !MDP Playground: Meta-Features in Reinforcement Learning , 2019, ArXiv.

[151]  José Miguel Hernández-Lobato,et al.  Constrained Bayesian optimization for automatic chemical design using variational autoencoders. , 2019 .

[152]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[153]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[154]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[155]  Ilya Kostrikov,et al.  Automatic Data Augmentation for Generalization in Deep Reinforcement Learning , 2020, ArXiv.

[156]  Junhyuk Oh,et al.  A Self-Tuning Actor-Critic Algorithm , 2020, NeurIPS.

[157]  K. Choromanski,et al.  Effective Diversity in Population-Based Reinforcement Learning , 2020, NeurIPS.

[158]  Jimmy Ba,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[159]  Leslie Pack Kaelbling,et al.  Meta-learning curiosity algorithms , 2020, ICLR.

[160]  Stephen Roberts,et al.  Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits , 2020, NeurIPS.

[161]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[162]  R. Munos,et al.  Adaptive Trade-Offs in Off-Policy Learning , 2019, AISTATS.

[163]  S. Levine,et al.  Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[164]  Julian Togelius,et al.  Fully Differentiable Procedural Content Generation through Generative Playing Networks , 2020, ArXiv.

[165]  Shimon Whiteson,et al.  Growing Action Spaces , 2019, ICML.

[166]  Jan Peters,et al.  Self-Paced Deep Reinforcement Learning , 2020, NeurIPS.

[167]  Edward Grefenstette,et al.  The NetHack Learning Environment , 2020, NeurIPS.

[168]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[169]  Tor Lattimore,et al.  Behaviour Suite for Reinforcement Learning , 2019, ICLR.

[170]  Luisa M. Zintgraf,et al.  VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning , 2019, ICLR.

[171]  Junhyuk Oh,et al.  Discovering Reinforcement Learning Algorithms , 2020, NeurIPS.

[172]  Improving Generalization in Meta Reinforcement Learning using Learned Objectives , 2019, ICLR.

[173]  O. Pietquin,et al.  Munchausen Reinforcement Learning , 2020, NeurIPS.

[174]  Yoshua Bengio,et al.  Revisiting Fundamentals of Experience Replay , 2020, ICML.

[175]  Razvan Pascanu,et al.  Stabilizing Transformers for Reinforcement Learning , 2019, ICML.

[176]  Scott M. Jordan,et al.  Evaluating the Performance of Reinforcement Learning Algorithms , 2020, ICML.

[177]  Lars Hertel,et al.  Quantity vs. Quality: On Hyperparameter Optimization for Deep Reinforcement Learning , 2020, ArXiv.

[178]  Xingyou Song,et al.  Observational Overfitting in Reinforcement Learning , 2019, ICLR.

[179]  Junhyuk Oh,et al.  Meta-Gradient Reinforcement Learning with an Objective Discovered Online , 2020, NeurIPS.

[180]  Michael A. Osborne,et al.  Knowing The What But Not The Where in Bayesian Optimization , 2019, ICML.

[181]  Sameera S. Ponda,et al.  Autonomous navigation of stratospheric balloons using reinforcement learning , 2020, Nature.

[182]  Yujin Tang,et al.  Neuroevolution of self-interpretable agents , 2020, GECCO.

[183]  J. Schulman,et al.  Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.

[184]  Joel Lehman,et al.  Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions , 2020, ICML.

[185]  Michael A. Osborne,et al.  Bayesian Optimization for Iterative Learning , 2019, NeurIPS.

[186]  Natasha Jaques,et al.  Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design , 2020, NeurIPS.

[187]  Daniel Guo,et al.  Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.

[188]  Herke van Hoof,et al.  A Performance-Based Start State Curriculum Framework for Reinforcement Learning , 2020, AAMAS.

[189]  Pieter Abbeel,et al.  Automatic Curriculum Learning through Value Disagreement , 2020, NeurIPS.

[190]  Animesh Garg,et al.  D2RL: Deep Dense Architectures in Reinforcement Learning , 2020, ArXiv.

[191]  Daniel R. Jiang,et al.  BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization , 2020, NeurIPS.

[192]  Yujing Hu,et al.  Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping , 2020, NeurIPS.

[193]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[194]  Andrey Kolobov,et al.  Safe Reinforcement Learning via Curriculum Induction , 2020, NeurIPS.

[195]  Krzysztof Choromanski,et al.  Ready Policy One: World Building Through Active Learning , 2020, ICML.

[196]  俊一 甘利 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[197]  Matthew E. Taylor,et al.  Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey , 2020, J. Mach. Learn. Res..

[198]  Lorenz Wellhausen,et al.  Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.

[199]  Larry Rudolph,et al.  Implementation Matters in Deep RL: A Case Study on PPO and TRPO , 2020, ICLR.

[200]  John Schulman,et al.  Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[201]  Wenbo Gao,et al.  ES-ENAS: Combining Evolution Strategies with Neural Architecture Search at No Extra Cost for Reinforcement Learning , 2021, ArXiv.

[202]  Wojciech Zaremba,et al.  Asymmetric self-play for automatic goal discovery in robotic manipulation , 2021, ArXiv.

[203]  Stuart J. Russell,et al.  Quantifying Differences in Reward Functions , 2020, ICLR.

[204]  Frank Hutter,et al.  Sample-Efficient Automated Deep Reinforcement Learning , 2020, ICLR.

[205]  Learning Synthetic Environments for Reinforcement Learning with Evolution Strategies , 2021, ArXiv.

[206]  Michael A. Osborne,et al.  Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces , 2021, ICML.

[207]  Yevgen Chebotar,et al.  Meta Learning via Learned Loss , 2019, 2020 25th International Conference on Pattern Recognition (ICPR).

[208]  Johan S. Obando-Ceron,et al.  Revisiting Rainbow: Promoting more insightful and inclusive deep reinforcement learning research , 2020, ICML.

[209]  Michael A. Osborne,et al.  Personalized Closed-Loop Brain Stimulation for Effective Neurointervention Across Participants , 2021, bioRxiv.

[210]  Marius Lindauer,et al.  Self-Paced Context Evaluation for Contextual Reinforcement Learning , 2021, ICML.

[211]  Frank Hutter,et al.  DEHB: Evolutionary Hyberband for Scalable, Robust and Efficient Hyperparameter Optimization , 2021, IJCAI.

[212]  Michael A. Osborne,et al.  Deep reinforcement learning for efficient measurement of quantum devices , 2020, npj Quantum Information.

[213]  Yuval Tassa,et al.  From Motor Control to Team Play in Simulated Humanoid Football , 2021, Sci. Robotics.

[214]  Fabio Petroni,et al.  MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research , 2021, NeurIPS Datasets and Benchmarks.

[215]  Evolving Reinforcement Learning Algorithms , 2021, ICLR.

[216]  Sergey Levine,et al.  Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning , 2020, ICLR.

[217]  Edward Grefenstette,et al.  Prioritized Level Replay , 2020, ICML.

[218]  Max Jaderberg,et al.  Faster Improvement Rate Population Based Training , 2021, ArXiv.

[219]  Amr Ahmed,et al.  Amazon SageMaker Automatic Model Tuning: Scalable Gradient-Free Optimization , 2020, KDD.

[220]  Bodo Rosenhahn,et al.  CARL: A Benchmark for Contextual and Adaptive Reinforcement Learning , 2021, ArXiv.

[221]  Gergely Neu,et al.  Logistic $Q$-Learning , 2020, AISTATS.

[222]  Xingyou Song,et al.  RL-DARTS: Differentiable Architecture Search for Reinforcement Learning , 2021, ArXiv.

[223]  Zeb Kurth-Nelson,et al.  Alchemy: A structured task distribution for meta-reinforcement learning , 2021, ArXiv.

[224]  Junhyuk Oh,et al.  Discovery of Options via Meta-Learned Subgoals , 2021, NeurIPS.

[225]  Michael A. Osborne,et al.  Revisiting Design Choices in Offline Model Based Reinforcement Learning , 2021, ICLR.

[226]  Silvio Savarese,et al.  Adaptive Procedural Task Generation for Hard-Exploration Problems , 2020, ICLR.

[227]  Trevor Darrell,et al.  Regularization Matters in Policy Optimization -- An Empirical Study on Continuous Control. , 2020 .

[228]  Pierre-Yves Oudeyer,et al.  TeachMyAgent: a Benchmark for Automatic Curriculum Learning in Deep RL , 2021, ICML.

[229]  Marius Lindauer,et al.  TempoRL: Learning When to Act , 2021, ICML.

[230]  Aldo Pacchiano,et al.  Deep Reinforcement Learning with Dynamic Optimism , 2021, ArXiv.

[231]  R. H. Sakr,et al.  Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm , 2021, PloS one.

[232]  Frank Hutter,et al.  On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning , 2021, AISTATS.

[233]  F. Formenti,et al.  Simulation-based optimisation to quantify heterogeneity of specific ventilation and perfusion in the lung by the Inspired Sinewave Test , 2021, Scientific Reports.

[234]  Stephen Roberts,et al.  Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL , 2021, NeurIPS.

[235]  M. Lindauer,et al.  Hyperparameters in Contextual RL are Highly Situational , 2022, ArXiv.

[236]  Joshua B. Tenenbaum,et al.  Learning with AMIGo: Adversarially Motivated Intrinsic Goals , 2020, ICLR.

[237]  Max Jaderberg,et al.  Open-Ended Learning Leads to Generally Capable Agents , 2021, ArXiv.

[238]  Shengyi Huang,et al.  Griddly: A platform for AI research in games , 2021, Softw. Impacts.

[239]  Matthieu Geist,et al.  What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study , 2021, ICLR.

[240]  Satinder Singh,et al.  Bootstrapped Meta-Learning , 2021, ICLR.

[241]  Yevgen Chebotar,et al.  Visionary: Vision architecture discovery for robot learning , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[242]  Edward Grefenstette,et al.  Replay-Guided Adversarial Environment Design , 2021, NeurIPS.

[243]  Natasha Jaques,et al.  Environment Generation for Zero-Shot Compositional Reinforcement Learning , 2022, NeurIPS.