Note on the quadratic penalties in elastic weight consolidation

Catastrophic forgetting is an undesired phenomenon which occurs when neural networks are trained on different tasks sequentially. Elastic weight consolidation (EWC; ref. 1), published in PNAS, is a novel algorithm designed to safeguard against this. Despite its satisfying simplicity, EWC is remarkably effective. Motivated by Bayesian inference, EWC adds quadratic penalties to the loss function when learning a new task. The purpose of penalties is to approximate the loss surface from previous tasks. The authors derive the penalty for the two-task case and then extrapolate to handling multiple tasks. I believe, however, that the penalties for multiple tasks are applied inconsistently. In ref. 1 a separate penalty is maintained for each task T , centered at θ T ∗ , the value of θ obtained after training on task T . When these penalties are combined (assuming λ T = 1 ), the aggregate penalty is anchored at μ T = ( F A + F B … … [↵][1]1Email: fhuszar@twitter.com. [1]: #xref-corresp-1-1

[1]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[2]  Ferenc Huszár,et al.  Note on the quadratic penalties in elastic weight consolidation , 2017, Proceedings of the National Academy of Sciences.

[3]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[4]  Alexander J. Smola,et al.  Laplace Propagation , 2003, NIPS.

[5]  Manfred Opper,et al.  A Bayesian approach to on-line learning , 1999 .