Reply to Huszár: The elastic weight consolidation penalty is empirically valid

In our recent work on elastic weight consolidation (EWC) (1) we show that forgetting in neural networks can be alleviated by using a quadratic penalty whose derivation was inspired by Bayesian evidence accumulation. In his letter (2), Dr. Huszar provides an alternative form for this penalty by following the standard work on expectation propagation using the Laplace approximation (3). He correctly argues that in cases when more than two tasks are undertaken the two forms of the penalty are different. Dr. Huszar also shows that for a toy linear regression problem his expression appears to be better. We would like to thank Dr. Huszar for pointing out … [↵][1]1To whom correspondence should be addressed. Email: kirkpatrick@google.com. [1]: #xref-corresp-1-1

[1]  Alexander J. Smola,et al.  Laplace Propagation , 2003, NIPS.

[2]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[3]  Ferenc Huszár Note on the quadratic penalties in elastic weight consolidation , 2018, Proceedings of the National Academy of Sciences.