Supplementary Material : Continual Learning Through Synaptic Intelligence
暂无分享,去创建一个
As an additional experiment, we trained a CNN (4 convolutional, followed by 2 dense layers with dropout; cf. main text) on the split CIFAR-10 benchmark. We used the same multi-head setup as in the case of split MNIST using Adam (η = 1 × 10−3, β1 = 0.9, β2 = 0.999, minibatch size 256). First, we trained the network for 60 epochs on the first 5 categories (Task A). At this point the training accuracy was close to 1. Then the optimizer was reset and the network was trained for another 60 epochs on the remaining 5 categories (Task B). We ran identical experiments for both the control case (c = 0) and the case in which consolidation was active (c > 0). All experiments were repeated n = 10 times to quantify the uncertainty on the validation set accuracy.
[1] James Martens. Second-order Optimization for Neural Networks , 2016 .
[2] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[3] Ilya Sutskever,et al. Estimating the Hessian by Back-propagating Curvature , 2012, ICML.