Supplementary Material : Continual Learning Through Synaptic Intelligence

As an additional experiment, we trained a CNN (4 convolutional, followed by 2 dense layers with dropout; cf. main text) on the split CIFAR-10 benchmark. We used the same multi-head setup as in the case of split MNIST using Adam (η = 1 × 10−3, β1 = 0.9, β2 = 0.999, minibatch size 256). First, we trained the network for 60 epochs on the first 5 categories (Task A). At this point the training accuracy was close to 1. Then the optimizer was reset and the network was trained for another 60 epochs on the remaining 5 categories (Task B). We ran identical experiments for both the control case (c = 0) and the case in which consolidation was active (c > 0). All experiments were repeated n = 10 times to quantify the uncertainty on the validation set accuracy.