JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning

Learning rational behaviors in open-world games like Minecraft remains to be challenging for Reinforcement Learning (RL) research due to the compound challenge of partial observability, highdimensional visual perception and delayed reward. To address this, we propose JueWu-MC, a sampleefficient hierarchical RL approach equipped with representation learning and imitation learning to deal with perception and exploration. Specifically, our approach includes two levels of hierarchy, where the high-level controller learns a policy to control over options and the low-level workers learn to solve each sub-task. To boost the learning of sub-tasks, we propose a combination of techniques including 1) action-aware representation learning which captures underlying relations between action and representation, 2) discriminator-based self-imitation learning for efficient exploration, and 3) ensemble behavior cloning with consistency filtering for policy robustness. Extensive experiments show that JueWu-MC significantly improves sample efficiency and outperforms a set of baselines by a large margin. Notably, we won the championship of the NeurIPS MineRL 2021 research competition and achieved the highest performance score ever.

[1]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[2]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[3]  Alexey Skrynnik,et al.  Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations , 2021, Knowl. Based Syst..

[4]  Ville Hautamäki,et al.  Playing Minecraft with Behavioural Cloning , 2020, NeurIPS.

[5]  Ting Chen,et al.  Combo-Action: Training Agent For FPS Game with Auxiliary Tasks , 2019, AAAI.

[6]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[7]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[8]  Ruslan Salakhutdinov,et al.  MineRL: A Large-Scale Dataset of Minecraft Demonstrations , 2019, IJCAI.

[9]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[10]  Joseph J. Lim,et al.  Accelerating Reinforcement Learning with Learned Skill Priors , 2020, CoRL.

[11]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[12]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Christian Scheller,et al.  Sample Efficient Reinforcement Learning through Learning from Demonstrations in Minecraft , 2020, NeurIPS.

[14]  David Filliat,et al.  State Representation Learning for Control: An Overview , 2018, Neural Networks.

[15]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[16]  Wolfram Burgard,et al.  Scaling Imitation Learning in Minecraft , 2020, ArXiv.

[17]  Honglak Lee,et al.  Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[18]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[19]  Hangyu Mao,et al.  SEIHAI: A Sample-efficient Hierarchical AI for the MineRL Competition , 2021, DAI.

[20]  Doina Precup,et al.  Self-Supervised Attention-Aware Reinforcement Learning , 2021, AAAI.

[21]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[22]  Anca D. Dragan,et al.  SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards , 2019, ICLR.

[23]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[24]  Steve Nebel,et al.  Mining Learning and Crafting Scientific Experiments: A Literature Review on the Use of Minecraft in Education and Research , 2016, J. Educ. Technol. Soc..

[25]  Olivier Bachem,et al.  Google Research Football: A Novel Reinforcement Learning Environment , 2020, AAAI.

[26]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[27]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[28]  Masashi Sugiyama,et al.  Imitation Learning from Imperfect Demonstration , 2019, ICML.

[29]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[30]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[31]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[32]  Satinder Singh,et al.  Self-Imitation Learning , 2018, ICML.

[33]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[34]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[35]  Nan Jiang,et al.  Hierarchical Imitation and Reinforcement Learning , 2018, ICML.

[36]  Katja Hofmann,et al.  The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors , 2019, ArXiv.