论文信息 - Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark - 字舞流文

Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark

Sharada Mohantya b,c mohanty@aicrowd.com Jyotish Poonganama b jyotish@aicrowd.com Adrien Gaidond e adrien.gaidon@tri.global Andrey Kolobovd f akolobov@microsoft.com Blake Wulfed e blake.wulfe@tri.global Dipam Chakrabortyd c dipam@aicrowd.com Graz̆vydas S̆emetulskisd g grazvydas@threethirds.ai João Schapked h joaoschapke@gmail.com Jonas Kubiliusd g jonas@threethirds.ai Jurgis Paükonisd g jurgis@threethirds.ai Linas Klimasd g linas@threethirds.ai Matthew Hausknechtd f matthew.hausknecht@microsoft.com Patrick MacAlpined f patmac@gmail.com Quang Nhat Trand i quangtran@temple.edu Thomas Tumield j ttumiel@gmail.com Xiaocheng Tangd k xiaochengtang@didiglobal.com Xinwei Chend j o.xlnwel@outlook.com Christopher Hessel csh@openai.com Jacob Hiltonl jhilton@openai.com William Hebgen Gussl wguss@openai.com Sahika Genc m sahika@amazon.com John Schulmanl joschu@openai.com Karl Cobbe l karl@openai.com

Patrick MacAlpine | Matthew J. Hausknecht | Jonas Kubilius | Andrey Kolobov | Adrien Gaidon | John Schulman | Matthew Hausknecht | Sahika Genc | Karl Cobbe | Christopher Hesse | Jacob Hilton | Xiaocheng Tang | Blake Wulfe | William H. Guss | Sharada Mohanty | Dipam Chakraborty | Jyotish Poonganam | Thomas Tumiel | William Hebgen Guss | J. Schulman | M. Hausknecht | Christopher Hesse | Karl Cobbe | Jacob Hilton | Adrien Gaidon | A. Kolobov | S. Mohanty | Xiaocheng Tang | Patrick MacAlpine | J. Kubilius | Sahika Genc | Jyotish Poonganam | Blake Wulfe | Dipam Chakraborty | Gravzvydas vSemetulskis | João Schapke | Jurgis Pavsukonis | Linas Klimas | Quang Nhat Tran | Thomas Tumiel | Xinwei Chen | Gravzvydas vSemetulskis | Joao Schapke | Jurgis Pavsukonis | Linas Klimas | Xinwei Chen | Xinwei Chen

[1] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[2] Sergey Levine,et al. Learning to Run challenge: Synthesizing physiologically accurate motion using deep reinforcement learning , 2018, ArXiv.

[3] Marc G. Bellemare,et al. The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.

[4] Katja Hofmann,et al. The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors , 2019, ArXiv.

[5] Pieter Abbeel,et al. CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[6] Matthew J. Hausknecht,et al. Working Memory Graphs , 2019, ICML.

[7] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[8] Marc G. Bellemare,et al. DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.

[9] Richard Zhang,et al. Making Convolutional Networks Shift-Invariant Again , 2019, ICML.

[10] J. Schulman,et al. Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.

[11] John Schulman,et al. Gotta Learn Fast: A New Benchmark for Generalization in RL , 2018, ArXiv.

[12] In-So Kweon,et al. CBAM: Convolutional Block Attention Module , 2018, ECCV.

[13] Ruslan Salakhutdinov,et al. MineRL: A Large-Scale Dataset of Minecraft Demonstrations , 2019, IJCAI.

[14] Pieter Abbeel,et al. Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[15] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[16] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[18] John Schulman,et al. Phasic Policy Gradient , 2020, ICML.

[19] Julian Togelius,et al. Cellular automata for real-time generation of infinite cave levels , 2010, PCGames@FDG.

[20] Alexander J. Smola,et al. P3O: Policy-on Policy-off Policy Optimization , 2019, UAI.

[21] Katja Hofmann,et al. The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[22] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[23] Nicholay Topin,et al. Super-convergence: very fast training of neural networks using large learning rates , 2018, Defense + Commercial Sensing.

[24] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[25] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[26] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.

[27] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.

[28] Tom Mitchell,et al. Jelly Bean World: A Testbed for Never-Ending Learning , 2020, ICLR.