论文信息 - Scalable and Order-robust Continual Learning with Hierarchically Decomposed Networks.

Scalable and Order-robust Continual Learning with Hierarchically Decomposed Networks.

While recent continual learning methods largely alleviate the catastrophic problem on toy-size datasets, there are issues that remain to be tackled in order to apply them to real-world problem domains. First, a continual learning model should effectively handle catastrophic forgetting and be efficient to train even with large number of tasks. Secondly, it needs to tackle the problem of order-sensitivity, where the performance of the tasks largely vary based on the order of the task arrival sequence, as it may cause serious problems where fairness plays a critical role (e.g. medical diagnosis). To tackle these practical challenges, we propose a novel continual learning method that is scalable as well as order-robust, which instead of learning a completely shared set of weights, represents the parameter for each task as a sum of task-shared and sparse task-adaptive parameters. With our hierarchically decomposed networks (HDN), the task-adaptive parameters for earlier tasks remain mostly unaffected, where we update them only to reflect the changes made to the task-shared parameters. This decomposition of parameters effectively prevents catastrophic forgetting and order-sensitivity, while being computation- and memory-efficient. Further, with hierarchical knowledge consolidation which clusters the task-adaptive parameters to obtain hierarchically shared parameters, HDN becomes highly scalable. We validate HDN on multiple benchmark datasets against state-of-the-art continual learning methods, which it largely outperforms in accuracy, efficiency, scalability, and order-robustness.

Neil C. Rabinowitz | J. Veness | Razvan Pascanu | Guillaume Desjardins | J. Kirkpatrick

[1] Sebastian Thrun,et al. A Lifelong Learning Perspective for Mobile Robot Control , 1994, IROS.

[2] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[4] Honglak Lee,et al. Online Incremental Feature Learning with Denoising Autoencoders , 2012, AISTATS.

[5] Hal Daumé,et al. Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[6] Eric Eaton,et al. ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[7] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[8] Razvan Pascanu,et al. Progressive Neural Networks , 2016, ArXiv.

[9] Byoung-Tak Zhang,et al. Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[10] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[11] Jiwon Kim,et al. Continual Learning with Deep Generative Replay , 2017, NIPS.

[12] Xiang Bai,et al. Dynamic Multi-Task Learning with Convolutional Neural Network , 2017, IJCAI.

[13] Zhanxing Zhu,et al. Reinforced Continual Learning , 2018, NeurIPS.

[14] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Sung Ju Hwang,et al. Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[16] Yee Whye Teh,et al. Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[17] Richard E. Turner,et al. Variational Continual Learning , 2017, ICLR.