Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark

Sharada Mohantya b,c mohanty@aicrowd.com Jyotish Poonganama b jyotish@aicrowd.com Adrien Gaidond e adrien.gaidon@tri.global Andrey Kolobovd f akolobov@microsoft.com Blake Wulfed e blake.wulfe@tri.global Dipam Chakrabortyd c dipam@aicrowd.com Graz̆vydas S̆emetulskisd g grazvydas@threethirds.ai João Schapked h joaoschapke@gmail.com Jonas Kubiliusd g jonas@threethirds.ai Jurgis Paükonisd g jurgis@threethirds.ai Linas Klimasd g linas@threethirds.ai Matthew Hausknechtd f matthew.hausknecht@microsoft.com Patrick MacAlpined f patmac@gmail.com Quang Nhat Trand i quangtran@temple.edu Thomas Tumield j ttumiel@gmail.com Xiaocheng Tangd k xiaochengtang@didiglobal.com Xinwei Chend j o.xlnwel@outlook.com Christopher Hessel csh@openai.com Jacob Hiltonl jhilton@openai.com William Hebgen Gussl wguss@openai.com Sahika Genc m sahika@amazon.com John Schulmanl joschu@openai.com Karl Cobbe l karl@openai.com

[1]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[2]  Sergey Levine,et al.  Learning to Run challenge: Synthesizing physiologically accurate motion using deep reinforcement learning , 2018, ArXiv.

[3]  Marc G. Bellemare,et al.  The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.

[4]  Katja Hofmann,et al.  The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors , 2019, ArXiv.

[5]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[6]  Matthew J. Hausknecht,et al.  Working Memory Graphs , 2019, ICML.

[7]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[8]  Marc G. Bellemare,et al.  DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.

[9]  Richard Zhang,et al.  Making Convolutional Networks Shift-Invariant Again , 2019, ICML.

[10]  J. Schulman,et al.  Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.

[11]  John Schulman,et al.  Gotta Learn Fast: A New Benchmark for Generalization in RL , 2018, ArXiv.

[12]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[13]  Ruslan Salakhutdinov,et al.  MineRL: A Large-Scale Dataset of Minecraft Demonstrations , 2019, IJCAI.

[14]  Pieter Abbeel,et al.  Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[15]  Rémi Munos,et al.  Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[16]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[18]  John Schulman,et al.  Phasic Policy Gradient , 2020, ICML.

[19]  Julian Togelius,et al.  Cellular automata for real-time generation of infinite cave levels , 2010, PCGames@FDG.

[20]  Alexander J. Smola,et al.  P3O: Policy-on Policy-off Policy Optimization , 2019, UAI.

[21]  Katja Hofmann,et al.  The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[22]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[23]  Nicholay Topin,et al.  Super-convergence: very fast training of neural networks using large learning rates , 2018, Defense + Commercial Sensing.

[24]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[25]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[26]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[27]  Shane Legg,et al.  Noisy Networks for Exploration , 2017, ICLR.

[28]  Tom Mitchell,et al.  Jelly Bean World: A Testbed for Never-Ending Learning , 2020, ICLR.