Learning to Teach

Teaching plays a very important role in our society, by spreading human knowledge and educating our next generations. A good teacher will select appropriate teaching materials, impact suitable methodologies, and set up targeted examinations, according to the learning behaviors of the students. In the field of artificial intelligence, however, one has not fully explored the role of teaching, and pays most attention to machine \emph{learning}. In this paper, we argue that equal attention, if not more, should be paid to teaching, and furthermore, an optimization framework (instead of heuristics) should be used to obtain good teaching strategies. We call this approach “learning to teach”. In the approach, two intelligent agents interact with each other: a student model (which corresponds to the learner in traditional machine learning algorithms), and a teacher model (which determines the appropriate data, loss function, and hypothesis space to facilitate the training of the student model). The teacher model leverages the feedback from the student model to optimize its own teaching strategies by means of reinforcement learning, so as to achieve teacher-student co-evolution. To demonstrate the practical value of our proposed approach, we take the training of deep neural networks (DNN) as an example, and show that by using the learning to teach techniques, we are able to use much less training data and fewer iterations to achieve almost the same accuracy for different kinds of DNN models (e.g., multi-layer perceptron, convolutional neural networks and recurrent neural networks) under various machine learning tasks (e.g., image classification and text understanding).

[1]  Alex Graves,et al.  Automated Curriculum Learning for Neural Networks , 2017, ICML.

[2]  Xiaojin Zhu,et al.  The Label Complexity of Mixed-Initiative Classifier Training , 2016, ICML.

[3]  Yulia Tsvetkov,et al.  Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning , 2016, ACL.

[4]  Xiaojin Zhu,et al.  Machine Teaching for Bayesian Learners in the Exponential Family , 2013, NIPS.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[7]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[8]  Fei Tian,et al.  Recurrent Residual Learning for Sequence Classification , 2016, EMNLP.

[9]  M. Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[10]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[11]  Le Song,et al.  Iterative Machine Teaching , 2017, ICML.

[12]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[15]  Deyu Meng,et al.  Easy Samples First: Self-paced Reranking for Zero-Example Multimedia Search , 2014, ACM Multimedia.

[16]  Wei Liu,et al.  Label Propagation via Teaching-to-Learn and Learning-to-Teach , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[18]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[19]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[20]  Valentin I. Spitkovsky,et al.  From Baby Steps to Leapfrog: How “Less is More” in Unsupervised Dependency Parsing , 2010, NAACL.

[21]  Jitendra Malik,et al.  Learning to Optimize , 2016, ICLR.

[22]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[23]  Daan Wierstra,et al.  One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[24]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[25]  Yong Jae Lee,et al.  Learning the easy things first: Self-paced visual category discovery , 2011, CVPR 2011.

[26]  Fiery Cushman,et al.  Showing versus doing: Teaching by demonstration , 2016, NIPS.

[27]  Lex Weaver,et al.  The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.

[28]  Noah D. Goodman,et al.  A rational account of pedagogical reasoning: Teaching by, and learning from, examples , 2014, Cognitive Psychology.

[29]  Deva Ramanan,et al.  Self-Paced Learning for Long-Term Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[31]  Paul Barford,et al.  Explicit Defense Actions Against Test-Set Attacks , 2017, AAAI.

[32]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[33]  Shiguang Shan,et al.  Self-Paced Learning with Diversity , 2014, NIPS.

[34]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[35]  Wei Liu,et al.  Multi-Modal Curriculum Learning for Semi-Supervised Image Classification , 2016, IEEE Transactions on Image Processing.

[36]  Shai Shalev-Shwartz,et al.  On Graduated Optimization for Stochastic Non-Convex Problems , 2015, ICML.

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Wei Liu,et al.  Teaching-to-Learn and Learning-to-Teach for Multi-label Propagation , 2016, AAAI.

[39]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[40]  Xiaojin Zhu,et al.  Machine Teaching: An Inverse Problem to Machine Learning and an Approach Toward Optimal Education , 2015, AAAI.

[41]  Xiaojin Zhu,et al.  The Teaching Dimension of Linear Learners , 2015, ICML.

[42]  Eric P. Xing,et al.  Easy Questions First? A Case Study on Curriculum Learning for Question Answering , 2016, ACL.