Scalable and Order-robust Continual Learning with Hierarchically Decomposed Networks.

While recent continual learning methods largely alleviate the catastrophic problem on toy-size datasets, there are issues that remain to be tackled in order to apply them to real-world problem domains. First, a continual learning model should effectively handle catastrophic forgetting and be efficient to train even with large number of tasks. Secondly, it needs to tackle the problem of order-sensitivity, where the performance of the tasks largely vary based on the order of the task arrival sequence, as it may cause serious problems where fairness plays a critical role (e.g. medical diagnosis). To tackle these practical challenges, we propose a novel continual learning method that is scalable as well as order-robust, which instead of learning a completely shared set of weights, represents the parameter for each task as a sum of task-shared and sparse task-adaptive parameters. With our hierarchically decomposed networks (HDN), the task-adaptive parameters for earlier tasks remain mostly unaffected, where we update them only to reflect the changes made to the task-shared parameters. This decomposition of parameters effectively prevents catastrophic forgetting and order-sensitivity, while being computation- and memory-efficient. Further, with hierarchical knowledge consolidation which clusters the task-adaptive parameters to obtain hierarchically shared parameters, HDN becomes highly scalable. We validate HDN on multiple benchmark datasets against state-of-the-art continual learning methods, which it largely outperforms in accuracy, efficiency, scalability, and order-robustness.

[1]  Sebastian Thrun,et al.  A Lifelong Learning Perspective for Mobile Robot Control , 1994, IROS.

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[4]  Honglak Lee,et al.  Online Incremental Feature Learning with Denoising Autoencoders , 2012, AISTATS.

[5]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[6]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[7]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[8]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[9]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[10]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[11]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[12]  Xiang Bai,et al.  Dynamic Multi-Task Learning with Convolutional Neural Network , 2017, IJCAI.

[13]  Zhanxing Zhu,et al.  Reinforced Continual Learning , 2018, NeurIPS.

[14]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Sung Ju Hwang,et al.  Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[16]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[17]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.