The Atari Grand Challenge Dataset

Recent progress in Reinforcement Learning (RL), fueled by its combination, with Deep Learning has enabled impressive results in learning to interact with complex virtual environments, yet real-world applications of RL are still scarce. A key limitation is data efficiency, with current state-of-the-art approaches requiring millions of training samples. A promising way to tackle this problem is to augment RL with learning from human demonstrations. However, human demonstration data is not yet readily available. This hinders progress in this direction. The present work addresses this problem as follows. We (i) collect and describe a large dataset of human Atari 2600 replays -- the largest and most diverse such data set publicly released to date, (ii) illustrate an example use of this dataset by analyzing the relation between demonstration quality and imitation learning performance, and (iii) outline possible research directions that are opened up by our work.

[1]  Tom Schaul,et al.  Learning from Demonstrations for Real World Reinforcement Learning , 2017, ArXiv.

[2]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[3]  Balaraman Ravindran,et al.  Dynamic Frame skip Deep Q Network , 2016, ArXiv.

[4]  Katja Hofmann,et al.  A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games , 2016, ICLR 2016.

[5]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[6]  Balaraman Ravindran,et al.  Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning , 2017, ICLR.

[7]  Traian Rebedea,et al.  Playing Atari Games with Deep Reinforcement Learning and Human Checkpoint Replay , 2016, ArXiv.

[8]  Katja Hofmann,et al.  The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[9]  Andrea Lockerd Thomaz,et al.  Exploration from Demonstration for Interactive Reinforcement Learning , 2016, AAMAS.

[10]  Shimon Whiteson,et al.  Inverse Reinforcement Learning from Failure , 2016, AAMAS.

[11]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[12]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[13]  Elliot Meyerson,et al.  Frame Skip Is a Powerful Parameter for Learning to Play Atari , 2015, AAAI Workshop: Learning for General Competency in Video Games.

[14]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[15]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[17]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Kenta Oono,et al.  Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[19]  Shie Mannor,et al.  Model-based Adversarial Imitation Learning , 2016, ArXiv.

[20]  Katja Hofmann,et al.  Asynchronous Data Aggregation for Training End to End Visual Control Networks , 2017, AAMAS.

[21]  Joshua B. Tenenbaum,et al.  Human Learning in Atari , 2017, AAAI Spring Symposia.

[22]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[23]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[26]  Xi Chen,et al.  Learning From Demonstration in the Wild , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[27]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.