Incremental Sequence Learning

Deep learning research over the past years has shown that by increasing the scope or difficulty of the learning problem over time, increasingly complex learning problems can be addressed. We study incremental learning in the context of sequence learning, using generative RNNs in the form of multi-layer recurrent Mixture Density Networks. While the potential of incremental or curriculum learning to enhance learning is known, indiscriminate application of the principle does not necessarily lead to improvement, and it is essential therefore to know which forms of incremental or curriculum learning have a positive effect. This research contributes to that aim by comparing three instantiations of incremental or curriculum learning. We introduce Incremental Sequence Learning, a simple incremental approach to sequence learning. Incremental Sequence Learning starts out by using only the first few steps of each sequence as training data. Each time a performance criterion has been reached, the length of the parts of the sequences used for training is increased. We introduce and make available a novel sequence learning task and data set: predicting and classifying MNIST pen stroke sequences. We find that Incremental Sequence Learning greatly speeds up sequence learning and reaches the best test performance level of regular sequence learning 20 times faster, reduces the test error by 74%, and in general performs more robustly; it displays lower variance and achieves sustained progress after all three comparison methods have stopped improving. The other instantiations of curriculum learning do not result in any noticeable improvement. A trained sequence prediction model is also used in transfer learning to the task of sequence classification, where it is found that transfer learning realizes improved classification performance compared to methods that learn to classify from scratch.

[1]  Risto Miikkulainen,et al.  Incremental Evolution of Complex General Behavior , 1997, Adapt. Behav..

[2]  C. Lee Giles,et al.  Sequence learning: from recognition and prediction to sequential decision making , 2001, IEEE Intelligent Systems.

[3]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[4]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[5]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[6]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[7]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[10]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[12]  Lorien Y. Pratt,et al.  Discriminability-Based Transfer between Neural Networks , 1992, NIPS.

[13]  S. Srihari Mixture Density Networks , 1994 .

[14]  Richard Granger,et al.  Incremental Learning from Noisy Data , 1986, Machine Learning.

[15]  Christophe G. Giraud-Carrier,et al.  A Note on the Utility of Incremental Learning , 2000, AI Commun..

[16]  Ching Y. Suen,et al.  A fast parallel algorithm for thinning digital patterns , 1984, CACM.

[17]  David E. Moriarty,et al.  Symbiotic Evolution of Neural Networks in Sequential Decision Tasks , 1997 .

[18]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[21]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[22]  Mark W. Schmidt,et al.  Structure learning in random fields for heart motion abnormality detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[24]  Geoffrey E. Hinton,et al.  Inferring Motor Programs from Images of Handwritten Digits , 2005, NIPS.

[25]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.