Simplified Computation and Interpretation of Fisher Matrices in Incremental Learning with Deep Neural Networks

Import recent advances in the domain of incremental or continual learning with DNNs, such as Elastic Weight Consolidation (EWC) or Incremental Moment Matching (IMM) rely on a quantity termed the Fisher information matrix (FIM). While the results obtained in this way are very promising, the use of the FIM relies on the assumptions that (a) the FIM can be approximated by its diagonal, and (b) that FIM diagonal entries are related to the variance of a DNN parameter in the context of Bayesian neural networks. In addition, the FIM is notoriously difficult to compute in automatic differentiation (AD) systems frameworks like TensorFlow, and existing implementations require an excessive use of memory due to this problem. We present the Matrix of SQuares (MaSQ), computed similarly as the FIM, but whose use in EWC-like algorithms follows directly from the calculus of derivatives and requires no additional assumptions. Additionally, MaSQ computation in AD frameworks is much simpler and more memory-efficient FIM computation. When using MaSQ together with EWC we show superior or equal performance to FIM/EWC on a variety of benchmark tasks.

[1]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[2]  Benedikt Pfülb,et al.  A comprehensive, application-oriented study of catastrophic forgetting in DNNs , 2019, ICLR.

[3]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ronald Kemker,et al.  Measuring Catastrophic Forgetting in Neural Networks , 2017, AAAI.

[5]  Ehud D. Karnin,et al.  A simple procedure for pruning back-propagation trained neural networks , 1990, IEEE Trans. Neural Networks.

[6]  Barbara Hammer,et al.  Incremental learning algorithms and applications , 2016, ESANN.

[7]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[8]  Hyo-Eun Kim,et al.  Keep and Learn: Continual Learning by Constraining the Latent Space for Knowledge Preservation in Neural Networks , 2018, MICCAI.

[9]  Ronald Kemker,et al.  FearNet: Brain-Inspired Model for Incremental Learning , 2017, ICLR.

[10]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[11]  Gregory J. Wolff,et al.  Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[12]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[15]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[16]  L. Vinet,et al.  A ‘missing’ family of classical orthogonal polynomials , 2010, 1011.1669.

[17]  Alexander Gepperth,et al.  A Bio-Inspired Incremental Learning Architecture for Applied Perceptual Problems , 2016, Cognitive Computation.

[18]  Marcus Rohrbach,et al.  Selfless Sequential Learning , 2018, ICLR.

[19]  Hongzhi Wang,et al.  Life-long learning based on dynamic combination model , 2017, Appl. Soft Comput..

[20]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[21]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[22]  Jürgen Schmidhuber,et al.  Compete to Compute , 2013, NIPS.