The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors

Though deep reinforcement learning has led to breakthroughs in many difficult domains, these successes have required an ever-increasing number of samples. As state-of-the-art reinforcement learning (RL) systems require an exponentially increasing number of samples, their development is restricted to a continually shrinking segment of the AI community. Likewise, many of these systems cannot be applied to real-world problems, where environment samples are expensive. Resolution of these limitations requires new, sample-efficient methods. To facilitate research in this direction, we introduce the MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors. The primary goal of the competition is to foster the development of algorithms which can efficiently leverage human demonstrations to drastically reduce the number of samples needed to solve complex, hierarchical, and sparse environments. To that end, we introduce: (1) the Minecraft ObtainDiamond task, a sequential decision making environment requiring long-term planning, hierarchical control, and efficient exploration methods; and (2) the MineRL-v0 dataset, a large-scale collection of over 60 million state-action pairs of human demonstrations that can be resimulated into embodied trajectories with arbitrary modifications to game state and visuals. Participants will compete to develop systems which solve the ObtainDiamond task with a limited number of samples from the environment simulator, Malmo. The competition is structured into two rounds in which competitors are provided several paired versions of the dataset and environment with different game textures. At the end of each round, competitors will submit containerized versions of their learning algorithms and they will then be trained/evaluated from scratch on a hold-out dataset-environment pair for a total of 4-days on a prespecified hardware platform.

[1]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[2]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[3]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[4]  Prabhat Nagarajan,et al.  ChainerRL: A Deep Reinforcement Learning Library , 2019, ArXiv.

[5]  Anand Sriraman,et al.  Imitation Learning on Atari using Non-Expert Human Annotations , 2018, HCOMP.

[6]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[7]  Brandon Houghton,et al.  Retrospective Analysis of the 2019 MineRL Competition on Sample Efficient Reinforcement Learning , 2019, Proceedings of Machine Learning Research.

[8]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[9]  Fernando Diaz,et al.  Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains , 2016, ArXiv.

[10]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[11]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[12]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[13]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[14]  Jeremy Hsu AI takes on popular Minecraft game in machine-learning contest , 2019, Nature.

[15]  Richard Socher,et al.  Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning , 2017, ICLR.

[16]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[17]  Ruslan Salakhutdinov,et al.  MineRL: A Large-Scale Dataset of Minecraft Demonstrations , 2019, IJCAI.

[18]  Ruslan Salakhutdinov,et al.  Guaranteeing Reproducibility in Deep Learning Competitions , 2020, ArXiv.

[19]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[20]  Anca D. Dragan,et al.  SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards , 2019, ICLR.

[21]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[22]  Alexey Skrynnik,et al.  Hierarchical Deep Q-Network with Forgetting from Imperfect Demonstrations in Minecraft , 2019, ArXiv.

[23]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[24]  Katja Hofmann,et al.  The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[25]  Tom Schaul,et al.  Learning from Demonstrations for Real World Reinforcement Learning , 2017, ArXiv.

[26]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[27]  Matthew E. Taylor,et al.  Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning , 2017, ArXiv.

[28]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[29]  Yang Gao,et al.  Reinforcement Learning from Imperfect Demonstrations , 2018, ICLR.

[30]  Julian Togelius,et al.  Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning , 2019, IJCAI.

[31]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[32]  Sam Devlin,et al.  The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ) Competition , 2019, ArXiv.

[33]  Mark O. Riedl,et al.  Improving Deep Reinforcement Learning in Minecraft with Action Advice , 2019, AIIDE.

[34]  一樹 美添,et al.  5分で分かる! ? 有名論文ナナメ読み:Silver, D. et al. : Mastering the Game of Go without Human Knowledge , 2018 .

[35]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[36]  Alexei A. Efros,et al.  Investigating Human Priors for Playing Video Games , 2018, ICML.

[37]  Michael L. Littman,et al.  Deep Reinforcement Learning from Policy-Dependent Human Feedback , 2019, ArXiv.

[38]  Sergey Levine,et al.  Learning to Run challenge: Synthesizing physiologically accurate motion using deep reinforcement learning , 2018, ArXiv.

[39]  Richard Socher,et al.  Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards , 2019, NeurIPS.

[40]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[41]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[42]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[43]  Jade Master,et al.  String Diagrams for Assembly Planning , 2020, Diagrams.

[44]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[45]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[46]  Stefanie Tellex,et al.  Goal-Based Action Priors , 2015, ICAPS.

[47]  John Schulman,et al.  Gotta Learn Fast: A New Benchmark for Generalization in RL , 2018, ArXiv.

[48]  Julian Togelius,et al.  Generative design in minecraft (GDMC): settlement generation competition , 2018, FDG.