Selfless Sequential Learning

Sequential learning, also called lifelong learning, studies the problem of learning tasks in a sequence with access restricted to only the data of the current task. In this paper we look at a scenario with fixed model capacity, and postulate that the learning process should not be selfish, i.e. it should account for future tasks to be added and thus leave enough capacity for them. To achieve Selfless Sequential Learning we study different regularization strategies and activation functions. We find that imposing sparsity at the level of the representation (i.e.~neuron activations) is more beneficial for sequential learning than encouraging parameter sparsity. In particular, we propose a novel regularizer, that encourages representation sparsity by means of neural inhibition. It results in few active neurons which in turn leaves more free neurons to be utilized by upcoming tasks. As neural inhibition over an entire layer can be too drastic, especially for complex tasks requiring strong representations, our regularizer only inhibits other neurons in a local neighbourhood, inspired by lateral inhibition processes in the brain. We combine our novel regularizer, with state-of-the-art lifelong learning methods that penalize changes to important previously learned parts of the network. We show that our new regularizer leads to increased sparsity which translates in consistent performance improvement %over alternative regularizers we studied on diverse datasets.

[1]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[2]  F. Xavier Roca,et al.  Regularizing CNNs with Locally Constrained Decorrelations , 2016, ICLR.

[3]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[4]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[6]  Tinne Tuytelaars,et al.  Expert Gate: Lifelong Learning with a Network of Experts , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[8]  R. French Dynamically constraining connectionist networks to produce distributed, orthogonal representations to reduce catastrophic interference , 2019, Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society.

[9]  Wei Xiong,et al.  Regularizing Deep Convolutional Neural Networks with a Structured Decorrelation Constraint , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[10]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[11]  Yifan Gong,et al.  Restructuring of deep neural network acoustic models with singular value decomposition , 2013, INTERSPEECH.

[12]  Afshin Abdi,et al.  Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee , 2016, NIPS.

[13]  Alexandros Karatzoglou,et al.  Overcoming Catastrophic Forgetting with Hard Attention to the Task , 2018 .

[14]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[15]  Robert M. French,et al.  Semi-distributed Representations and Catastrophic Forgetting in Connectionist Networks , 1992 .

[16]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  E. Capaldi,et al.  The organization of behavior. , 1992, Journal of applied behavior analysis.

[19]  John K. Kruschke,et al.  Human Category Learning: Implications for Backpropagation Models , 1993 .

[20]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Gordon M Shepherd,et al.  Sparse Coding and Lateral Inhibition Arising from Balanced and Unbalanced Dendrodendritic Excitation and Inhibition , 2014, The Journal of Neuroscience.

[22]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[23]  Philip H. S. Torr,et al.  Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , 2018, ECCV.

[24]  D. Tolhurst,et al.  Characterizing the sparseness of neural codes , 2001, Network.

[25]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[26]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[27]  Joost van de Weijer,et al.  Rotate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[28]  Wolfgang Maass,et al.  On the Computational Power of Winner-Take-All , 2000, Neural Computation.

[29]  Jürgen Schmidhuber,et al.  Compete to Compute , 2013, NIPS.

[30]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[31]  Manik Varma,et al.  Character Recognition in Natural Images , 2009, VISAPP.

[32]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[33]  J. Kruschke,et al.  ALCOVE: an exemplar-based connectionist model of category learning. , 1992, Psychological review.

[34]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[35]  Ross B. Girshick,et al.  Reducing Overfitting in Deep Networks by Decorrelating Representations , 2015, ICLR.

[36]  Matthew B. Blaschko,et al.  Encoder Based Lifelong Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[38]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[39]  Stanford,et al.  Tiny ImageNet Classification with Convolutional Neural Networks , 2015 .

[40]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[41]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[42]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[43]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[44]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[45]  L’oubli catastrophique it,et al.  Avoiding catastrophic forgetting by coupling two reverberating neural networks , 2004 .

[46]  Hassan Foroosh,et al.  Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Max Welling,et al.  Bayesian Compression for Deep Learning , 2017, NIPS.

[48]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[49]  Xiaogang Wang,et al.  Sparsifying Neural Network Connections for Face Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[51]  Surya Ganguli,et al.  Improved multitask learning through synaptic intelligence , 2017, ArXiv.

[52]  P. Lennie The Cost of Cortical Computation , 2003, Current Biology.