论文信息 - Policy Distillation - 字舞流文

Policy Distillation

Abstract: Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance. In this work, we present a novel method called policy distillation that can be used to extract the policy of a reinforcement learning agent and train a new network that performs at the expert level while being dramatically smaller and more efficient. Furthermore, the same method can be used to consolidate multiple task-specific policies into a single policy. We demonstrate these claims using the Atari domain and show that the multi-task distilled agent outperforms the single-task teachers as well as a jointly-trained DQN agent.

Razvan Pascanu | Sergio Gomez Colmenarejo | Raia Hadsell | Koray Kavukcuoglu | Guillaume Desjardins | Çaglar Gülçehre | Volodymyr Mnih | James Kirkpatrick | Andrei A. Rusu | K. Kavukcuoglu | R. Hadsell | Volodymyr Mnih | Çaglar Gülçehre | Razvan Pascanu | Guillaume Desjardins | J. Kirkpatrick

[1] Philip J. Fleming,et al. How not to lie with statistics: the correct way to summarize benchmark results , 1986, CACM.

[2] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[3] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[4] Warren B. Powell,et al. Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[5] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[6] Dan Klein,et al. Structure compilation: trading structure for features , 2008, ICML '08.

[7] Tony R. Martinez,et al. Improving Supervised Learning by Adapting the Problem to the Learner , 2009, Int. J. Neural Syst..

[8] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.

[9] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[10] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .

[11] Doina Precup,et al. Generalized Classication-bas ed Approximate Policy Iteration , 2012 .

[12] Rich Caruana,et al. A Dozen Tricks with Multitask Learning , 1996, Neural Networks: Tricks of the Trade.

[13] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[14] Shai Shalev-Shwartz,et al. SelfieBoost: A Boosting Algorithm for Deep Learning , 2014, ArXiv.

[15] Yifan Gong,et al. Learning small-size DNN with output-distribution-based criteria , 2014, INTERSPEECH.

[16] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[17] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[18] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[19] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[20] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[21] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[22] William Chan,et al. Transferring knowledge from a RNN to a DNN , 2015, INTERSPEECH.

[23] Dong Wang,et al. Knowledge Transfer Pre-training , 2015, ArXiv.

[24] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[25] Xinyun Chen. Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[26] Zhiyuan Tang,et al. Recurrent neural network training with dark knowledge transfer , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).