Teacher–Student Curriculum Learning

We propose Teacher–Student Curriculum Learning (TSCL), a framework for automatic curriculum learning, where the Student tries to learn a complex task, and the Teacher automatically chooses subtasks from a given set for the Student to train on. We describe a family of Teacher algorithms that rely on the intuition that the Student should practice more those tasks on which it makes the fastest progress, i.e., where the slope of the learning curve is highest. In addition, the Teacher algorithms address the problem of forgetting by also choosing tasks where the Student’s performance is getting worse. We demonstrate that TSCL matches or surpasses the results of carefully hand-crafted curricula in two tasks: addition of decimal numbers with long short-term memory (LSTM) and navigation in Minecraft. Our automatically ordered curriculum of submazes enabled to solve a Minecraft maze that could not be solved at all when training directly on that maze, and the learning was an order of magnitude faster than a uniform sampling of those submazes.

[1]  R. Bellman A Markovian Decision Process , 1957 .

[2]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[3]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[4]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[5]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[8]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[9]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[10]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[11]  Pierre-Yves Oudeyer,et al.  The strategic student approach for life-long exploration and learning , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[12]  Pierre-Yves Oudeyer,et al.  Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[13]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[14]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[15]  Sergey Levine,et al.  Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[16]  Pierre-Yves Oudeyer,et al.  Multi-Armed Bandits for Intelligent Tutoring Systems , 2013, EDM.

[17]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[20]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[21]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[22]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[23]  Alex Graves,et al.  Grid Long Short-Term Memory , 2015, ICLR.

[24]  Lukasz Kaiser,et al.  Neural GPUs Learn Algorithms , 2015, ICLR.

[25]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[26]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[27]  Wei Liu,et al.  Multi-Modal Curriculum Learning for Semi-Supervised Image Classification , 2016, IEEE Transactions on Image Processing.

[28]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[29]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[30]  Katja Hofmann,et al.  The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[31]  DarrellTrevor,et al.  End-to-end training of deep visuomotor policies , 2016 .

[32]  Balaraman Ravindran,et al.  Online Multi-Task Learning Using Biased Sampling , 2017 .

[33]  Wei Liu,et al.  Label Propagation via Teaching-to-Learn and Learning-to-Teach , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Alex Graves,et al.  Automated Curriculum Learning for Neural Networks , 2017, ICML.

[35]  Theja Tulabandhula,et al.  Faster Reinforcement Learning Using Active Simulators , 2017, ArXiv.

[36]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[37]  Stephen Tyree,et al.  Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.

[38]  Yuandong Tian,et al.  Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.

[39]  John Langford,et al.  Efficient Exploration in Reinforcement Learning , 2017, Encyclopedia of Machine Learning and Data Mining.

[40]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[41]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[42]  Balaraman Ravindran,et al.  Learning to Multi-Task by Active Sampling , 2017, ICLR.

[43]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[44]  Ivor W. Tsang,et al.  Progressive Stochastic Learning for Noisy Labels , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[45]  Ilya Kostrikov,et al.  Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[46]  Jian Yang,et al.  Ensemble Teaching for Hybrid Label Propagation , 2019, IEEE Transactions on Cybernetics.