Curriculum Distillation to Teach Playing Atari

Curriculum Distillation to Teach Playing Atari by Chen Tang Master of Science in Electrical Engineering and Computer Science University of California, Berkeley Professor John F. Canny, Chair We propose a framework of curriculum distillation in the setting of deep reinforcement learning. By selecting samples in its training history, a machine teacher sends those samples to a learner to improve its learning progress. In this paper, we investigate the idea on how to select these samples to maximize learner’s progress. One key idea is to apply the Zone of Proximal Development principle to guide the learner with samples slightly in advance of its current performance level. Another idea is to use the samples where teacher itself makes the biggest progress in its parameter space. To foster robust teaching and learning, we adapt such framework to distill curriculum from multiple teachers. We test such framework on a few Atari games. We show that those samples selected are both interpretable for humans, and are able to help machine learners converge faster in the training process.

[1]  Pietro Perona,et al.  Near-Optimal Machine Teaching via Explanatory Teaching Sets , 2018, AISTATS.

[2]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[3]  Yang Gao,et al.  Reinforcement Learning from Imperfect Demonstrations , 2018, ICLR.

[4]  Sandra Zilles,et al.  An Overview of Machine Teaching , 2018, ArXiv.

[5]  Jonathan Dodge,et al.  Visualizing and Understanding Atari Agents , 2017, ICML.

[6]  Le Song,et al.  Towards Black-box Iterative Machine Teaching , 2017, ICML.

[7]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Shane Legg,et al.  Noisy Networks for Exploration , 2017, ICLR.

[9]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[10]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[11]  Pieter Abbeel,et al.  Interpretable and Pedagogical Examples , 2017, ArXiv.

[12]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[13]  David Maxwell Chickering,et al.  Machine Teaching: A New Paradigm for Building Machine Learning Systems , 2017, ArXiv.

[14]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[15]  Le Song,et al.  Iterative Machine Teaching , 2017, ICML.

[16]  Alex Graves,et al.  Automated Curriculum Learning for Neural Networks , 2017, ICML.

[17]  John F. Canny,et al.  Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[19]  Ji Hyun Bak,et al.  Adaptive optimal training of animal behavior , 2017, NIPS.

[20]  Thomas L. Griffiths,et al.  Faster Teaching via POMDP Planning , 2016, Cogn. Sci..

[21]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[22]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[23]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[24]  Shie Mannor,et al.  Graying the black box: Understanding DQNs , 2016, ICML.

[25]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[26]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[27]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[28]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[29]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[30]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[31]  Xiaojin Zhu,et al.  Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners , 2015, AAAI.

[32]  Xiaojin Zhu,et al.  Machine Teaching: An Inverse Problem to Machine Learning and an Approach Toward Optimal Education , 2015, AAAI.

[33]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[34]  Bradley C. Love,et al.  Optimal Teaching for Limited-Capacity Human Learners , 2014, NIPS.

[35]  Bilge Mutlu,et al.  How Do Humans Teach: On Curriculum Learning and Teaching Dimension , 2011, NIPS.

[36]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[37]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[38]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[39]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[40]  Edwin S. Ellis,et al.  Research Synthesis on Effective Teaching Principles and the Design of Quality Tools for Educators. Technical Report No. 5. , 1994 .

[41]  David Minton The Cognitive Revolution , 1991 .

[42]  R. Bellman A Markovian Decision Process , 1957 .