论文信息 - Note on the quadratic penalties in elastic weight consolidation

Note on the quadratic penalties in elastic weight consolidation

Catastrophic forgetting is an undesired phenomenon which occurs when neural networks are trained on different tasks sequentially. Elastic weight consolidation (EWC; ref. 1), published in PNAS, is a novel algorithm designed to safeguard against this. Despite its satisfying simplicity, EWC is remarkably effective. Motivated by Bayesian inference, EWC adds quadratic penalties to the loss function when learning a new task. The purpose of penalties is to approximate the loss surface from previous tasks. The authors derive the penalty for the two-task case and then extrapolate to handling multiple tasks. I believe, however, that the penalties for multiple tasks are applied inconsistently. In ref. 1 a separate penalty is maintained for each task T , centered at θ T ∗ , the value of θ obtained after training on task T . When these penalties are combined (assuming λ T = 1 ), the aggregate penalty is anchored at μ T = ( F A + F B … … [↵][1]1Email: fhuszar@twitter.com. [1]: #xref-corresp-1-1

Ferenc Huszár

[1] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[2] Ferenc Huszár,et al. Note on the quadratic penalties in elastic weight consolidation , 2017, Proceedings of the National Academy of Sciences.

[3] Tom Minka,et al. Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[4] Alexander J. Smola,et al. Laplace Propagation , 2003, NIPS.

[5] Manfred Opper,et al. A Bayesian approach to on-line learning , 1999 .