Catastrophic forgetting is an undesired phenomenon which occurs when neural networks are trained on different tasks sequentially. Elastic weight consolidation (EWC; ref. 1), published in PNAS, is a novel algorithm designed to safeguard against this. Despite its satisfying simplicity, EWC is remarkably effective.
Motivated by Bayesian inference, EWC adds quadratic penalties to the loss function when learning a new task. The purpose of penalties is to approximate the loss surface from previous tasks. The authors derive the penalty for the two-task case and then extrapolate to handling multiple tasks. I believe, however, that the penalties for multiple tasks are applied inconsistently.
In ref. 1 a separate penalty is maintained for each task T , centered at θ T ∗ , the value of θ obtained after training on task T . When these penalties are combined (assuming λ T = 1 ), the aggregate penalty is anchored at μ T = ( F A + F B … …
[↵][1]1Email: fhuszar@twitter.com.
[1]: #xref-corresp-1-1
[1]
Razvan Pascanu,et al.
Overcoming catastrophic forgetting in neural networks
,
2016,
Proceedings of the National Academy of Sciences.
[2]
Ferenc Huszár,et al.
Note on the quadratic penalties in elastic weight consolidation
,
2017,
Proceedings of the National Academy of Sciences.
[3]
Tom Minka,et al.
Expectation Propagation for approximate Bayesian inference
,
2001,
UAI.
[4]
Alexander J. Smola,et al.
Laplace Propagation
,
2003,
NIPS.
[5]
Manfred Opper,et al.
A Bayesian approach to on-line learning
,
1999
.