Learning to teach

Teaching plays a very important role in our society, by spreading human knowledge and educating our next generations. A good teacher will select appropriate teaching materials, impact suitable methodologies, and set up targeted examinations, according to the learning behaviors of the students. In the field of artificial intelligence, however, one has not fully explored the role of teaching, and pays most attention to machine \emph{learning}. In this paper, we argue that equal attention, if not more, should be paid to teaching, and furthermore, an optimization framework (instead of heuristics) should be used to obtain good teaching strategies. We call this approach `learning to teach'. In the approach, two intelligent agents interact with each other: a student model (which corresponds to the learner in traditional machine learning algorithms), and a teacher model (which determines the appropriate data, loss function, and hypothesis space to facilitate the training of the student model). The teacher model leverages the feedback from the student model to optimize its own teaching strategies by means of reinforcement learning, so as to achieve teacher-student co-evolution. To demonstrate the practical value of our proposed approach, we take the training of deep neural networks (DNN) as an example, and show that by using the learning to teach techniques, we are able to use much less training data and fewer iterations to achieve almost the same accuracy for different kinds of DNN models (e.g., multi-layer perceptron, convolutional neural networks and recurrent neural networks) under various machine learning tasks (e.g., image classification and text understanding).

[1]  Albert T. Corbett,et al.  Intelligent Tutoring Systems , 1985, Science.

[2]  Michael Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[5]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Valentin I. Spitkovsky,et al.  From Baby Steps to Leapfrog: How “Less is More” in Unsupervised Dependency Parsing , 2010, NAACL.

[8]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[9]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[10]  Xiaojin Zhu,et al.  Machine Teaching for Bayesian Learners in the Exponential Family , 2013, NIPS.

[11]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[12]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[13]  Shiguang Shan,et al.  Self-Paced Learning with Diversity , 2014, NIPS.

[14]  Noah D. Goodman,et al.  A rational account of pedagogical reasoning: Teaching by, and learning from, examples , 2014, Cognitive Psychology.

[15]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Xiaojin Zhu,et al.  Machine Teaching: An Inverse Problem to Machine Learning and an Approach Toward Optimal Education , 2015, AAAI.

[18]  Xiaojin Zhu,et al.  The Label Complexity of Mixed-Initiative Classifier Training , 2016, ICML.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[21]  Wei Liu,et al.  Teaching-to-Learn and Learning-to-Teach for Multi-label Propagation , 2016, AAAI.

[22]  Xiaojin Zhu,et al.  The Teaching Dimension of Linear Learners , 2015, ICML.

[23]  Shai Shalev-Shwartz,et al.  On Graduated Optimization for Stochastic Non-Convex Problems , 2015, ICML.

[24]  Fiery Cushman,et al.  Showing versus doing: Teaching by demonstration , 2016, NIPS.

[25]  Alex Graves,et al.  Automated Curriculum Learning for Neural Networks , 2017, ICML.

[26]  Hong Yu,et al.  Meta Networks , 2017, ICML.

[27]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[28]  Le Song,et al.  Iterative Machine Teaching , 2017, ICML.

[29]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[30]  Jitendra Malik,et al.  Learning to Optimize , 2016, ICLR.