论文信息 - A Deep Hierarchical Approach to Lifelong Learning in Minecraft - 字舞流文

A Deep Hierarchical Approach to Lifelong Learning in Minecraft

We propose a lifelong learning system that has the ability to reuse and transfer knowledge from one task to another while efficiently retaining the previously learned knowledge-base. Knowledge is transferred by learning reusable skills to solve tasks in Minecraft, a popular video game which is an unsolved and high-dimensional lifelong learning problem. These reusable skills, which we refer to as Deep Skill Networks, are then incorporated into our novel Hierarchical Deep Reinforcement Learning Network (H-DRLN) architecture using two techniques: (1) a deep skill array and (2) skill distillation, our novel variation of policy distillation (Rusu et. al. 2015) for learning skills. Skill distillation enables the HDRLN to efficiently retain knowledge and therefore scale in lifelong learning, by accumulating knowledge and encapsulating multiple reusable skills into a single distilled network. The H-DRLN exhibits superior performance and lower learning sample complexity compared to the regular Deep Q Network (Mnih et. al. 2015) in sub-domains of Minecraft.

Shie Mannor | Tom Zahavy | Chen Tessler | Daniel J. Mankowitz | Shahar Givony | Shie Mannor | D. Mankowitz | Chen Tessler | Shahar Givony | Tom Zahavy

[1] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[2] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[3] S. C. Suddarth,et al. Rule-Injection Hints as a Means of Improving Network Performance and Learning Time , 1990, EURASIP Workshop.

[4] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[5] Sebastian Thrun,et al. Lifelong robot learning , 1993, Robotics Auton. Syst..

[6] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.

[7] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[8] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[9] Manuela M. Veloso,et al. Layered Learning , 2000, ECML.

[10] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.

[11] Peter Stone,et al. Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[12] Jean-Arcady Meyer,et al. Adaptive Behavior , 2005 .

[13] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[14] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[15] Joelle Pineau,et al. Proceedings of the Twenty-Ninth International Conference on Machine Learning , 2012 .

[16] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17] Bruno Castro da Silva,et al. Learning Parameterized Skills , 2012, ICML.

[18] Qiang Yang,et al. Lifelong Machine Learning Systems: Beyond Learning Algorithms , 2013, AAAI Spring Symposium: Lifelong Machine Learning.

[19] Eric Eaton,et al. ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[20] Eric Eaton,et al. Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.

[21] Shie Mannor,et al. Time-regularized interrupting options , 2014, ICML 2014.

[22] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.

[23] Shie Mannor,et al. Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations , 2014, ICML.

[24] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[25] Xiaoping Chen,et al. Online Planning for Large Markov Decision Processes with Hierarchical Decomposition , 2015, ACM Trans. Intell. Syst. Technol..

[26] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[27] Eric Eaton,et al. Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret , 2015, ICML.

[28] Leslie N. Smith,et al. Selecting Subgoals using Deep Learning in Minecraft : A Preliminary Report , 2016 .

[29] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.

[30] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[31] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[32] Alex Graves,et al. Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.

[33] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[34] Honglak Lee,et al. Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[35] Razvan Pascanu,et al. Progressive Neural Networks , 2016, ArXiv.

[36] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[37] Shie Mannor,et al. Graying the black box: Understanding DQNs , 2016, ICML.

[38] Wojciech Jaskowski,et al. ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[39] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[40] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[41] Shie Mannor,et al. Adaptive Skills Adaptive Partitions (ASAP) , 2016, NIPS.

[42] Marc G. Bellemare,et al. Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.

[43] Shie Mannor,et al. Iterative Hierarchical Optimization for Misspecified Problems (IHOMP) , 2016, ArXiv.

[44] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[45] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.