论文信息 - Some Considerations on Learning to Explore via Meta-Reinforcement Learning - 字舞流文

Some Considerations on Learning to Explore via Meta-Reinforcement Learning

We consider the problem of exploration in meta reinforcement learning. Two new meta reinforcement learning algorithms are suggested: E-MAML and E-$\text{RL}^2$. Results are presented on a novel environment we call `Krazy World' and a set of maze environments. We show E-MAML and E-$\text{RL}^2$ deliver better performance on tasks where exploration is important.

Pieter Abbeel | Xi Chen | Ilya Sutskever | Ge Yang | Yan Duan | Rein Houthooft | Bradly C. Stadie | Yuhuai Wu | P. Abbeel | Yuhuai Wu | Rein Houthooft | Yan Duan | Ilya Sutskever | Xi Chen | Ge Yang | I. Sutskever

[1] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[2] S. Hochreiter,et al. REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .

[3] Sebastian Thrun,et al. Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[4] Jürgen Schmidhuber,et al. Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[5] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[6] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[7] Jürgen Schmidhuber,et al. Exploring the predictable , 2003 .

[8] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[9] David Carmel,et al. Exploration Strategies for Model-based Learning in Multi-agent Systems: Exploration Strategies , 1999, Autonomous Agents and Multi-Agent Systems.

[10] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[11] Jürgen Schmidhuber,et al. Gödel Machines: Fully Self-referential Optimal Universal Self-improvers , 2007, Artificial General Intelligence.

[12] Shuji Hashimoto,et al. Learning from imperfect data , 2007, Appl. Soft Comput..

[13] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.

[14] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[15] Tom Schaul,et al. Artificial curiosity for autonomous space exploration , 2011 .

[16] Yi Sun,et al. Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments , 2011, AGI.

[17] Jürgen Schmidhuber,et al. Learning skills from play: Artificial curiosity on a Katana robot arm , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[18] Qiang Yang,et al. Lifelong Machine Learning Systems: Beyond Learning Algorithms , 2013, AAAI Spring Symposium: Lifelong Machine Learning.

[19] Jürgen Schmidhuber,et al. Evolving large-scale neural networks for vision-based reinforcement learning , 2013, GECCO '13.

[20] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[21] Pieter Abbeel,et al. Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.

[22] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[23] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[24] Jürgen Schmidhuber,et al. On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models , 2015, ArXiv.

[25] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[26] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[27] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[28] Alistair A. Young,et al. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2017, MICCAI 2017.

[29] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[30] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[31] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[32] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[33] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[34] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.

[35] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[36] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[37] Shie Mannor,et al. A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[38] Marijn F. Stollenga,et al. Continual curiosity-driven skill acquisition from high-dimensional video inputs for humanoid robots , 2017, Artif. Intell..

[39] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[40] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[41] Bradly C. Stadie,et al. Simulating the stochastic dynamics and cascade failure of power networks , 2018, 1806.02420.

[42] Pieter Abbeel,et al. Transfer Learning for Estimating Causal Effects using Neural Networks , 2018, ArXiv.

[43] Bradly C. Stadie,et al. One Demonstration Imitation Learning , 2019 .

[44] Tamim Asfour,et al. ProMP: Proximal Meta-Policy Search , 2018, ICLR.

[45] Jimmy Ba,et al. Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning , 2020, ICML.