论文信息 - Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents

Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents

There has been rapidly growing interest in meta-learning as a method for increasing the flexibility and sample efficiency of reinforcement learning. One problem in this area of research, however, has been a scarcity of adequate benchmark tasks. In general, the structure underlying past benchmarks has either been too simple to be inherently interesting, or too ill-defined to support principled analysis. In the present work, we introduce a new benchmark for meta-RL research, emphasizing transparency and potential for in-depth analysis as well as structural richness. Alchemy is a 3D video game, implemented in Unity, which involves a latent causal structure that is resampled procedurally from episode to episode, affording structure learning, online inference, hypothesis testing and action sequencing based on abstract domain knowledge. We evaluate a pair of powerful RL agents on Alchemy and present an in-depth analysis of one of these agents. Results clearly indicate a frank and specific failure of meta-learning, providing validation for Alchemy as a challenging benchmark for meta-RL. Concurrent with this report, we are releasing Alchemy as public resource, together with a suite of analysis tools and sample agent trajectories.

[1] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .

[2] W. Geisler. Ideal Observer Analysis , 2002 .

[3] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.

[4] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[5] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[6] P. Alam. ‘T’ , 2021, Composites Engineering: An A–Z Guide.

[7] Jane X. Wang,et al. Meta-learning in natural and artificial intelligence , 2020, Current Opinion in Behavioral Sciences.

[8] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[9] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[10] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[11] Marek Wydmuch,et al. ViZDoom Competitions: Playing Doom From Pixels , 2018, IEEE Transactions on Games.

[12] S. Levine,et al. Guided Meta-Policy Search , 2019, NeurIPS.

[13] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .

[14] Andrew Zisserman,et al. Kickstarting Deep Reinforcement Learning , 2018, ArXiv.

[15] Sergey Levine,et al. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[16] Shane Legg,et al. Meta-trained agents implement Bayes-optimal agents , 2020, NeurIPS.

[17] Taehoon Kim,et al. Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[18] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19] Yee Whye Teh,et al. Meta reinforcement learning as task inference , 2019, ArXiv.

[20] H. Francis Song,et al. V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control , 2019, ICLR.

[21] Yuval Tassa,et al. dm_control: Software and Tasks for Continuous Control , 2020, Softw. Impacts.

[22] Razvan Pascanu,et al. Distilling Policy Distillation , 2019, AISTATS.

[23] Roozbeh Mottaghi,et al. ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Chelsea Finn,et al. Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices , 2020, ICML.

[25] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[26] C A Nelson,et al. Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[27] Song-Chun Zhu,et al. HALMA: Humanlike Abstraction Learning Meets Affordance in Rapid Problem Solving , 2021, ArXiv.

[28] Sergey Levine,et al. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[29] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[30] Charles Kemp,et al. How to Grow a Mind: Statistics, Structure, and Abstraction , 2011, Science.