论文信息 - Epoch-Wise Double Descent Triggered by Learning a Single Sample

Epoch-Wise Double Descent Triggered by Learning a Single Sample

Recently, it has been empirically shown that if neural networks are trained on a single training image to reconstruct the input image, fully-connected neural networks (FCNs) learn to output the training image regardless of the test image (memorization), whereas fully-convolutional networks learn to output the given test images (generalization). We further investigated FCNs and empirically found that optimizers play an important role in the memorization process by splitting the model into the bias of the output layer (the bias component) and the rest (the weight component). Specifically, neural networks memorize the training image by the weight component at the early stage of training irrespective of the optimizer. Conversely, as the epoch increases, some optimizers force the weight component to forget the training image and the bias component to memorize the training image. By assuming that FCNs converge to the constant function during this shift, and measuring the generalization error, we observed epoch-wise double descent, which explains why early-stopping contributes to better generalizability.

T. Yamasaki | Hiroshi Kera | Aoshi Kawaguchi

[1] Yoshua Bengio,et al. Multi-scale Feature Learning Dynamics: Insights for Double Descent , 2021, ICML.

[2] Erkun Yang,et al. Understanding and Improving Early Stopping for Learning with Noisy Labels , 2021, NeurIPS.

[3] Hanlin Tang,et al. On the geometry of generalization and memorization in deep neural networks , 2021, ICLR.

[4] Samy Bengio,et al. Understanding deep learning (still) requires rethinking generalization , 2021, Commun. ACM.

[5] Jeffrey Pennington,et al. Understanding Double Descent Requires a Fine-Grained Bias-Variance Decomposition , 2020, NeurIPS.

[6] Kan Chen,et al. A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks , 2020, AISTATS.

[7] Reinhard Heckel,et al. Early Stopping in Deep Networks: Double Descent and How to Eliminate it , 2020, ICLR.

[8] Jeffrey Pennington,et al. The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization , 2020, ICML.

[9] Sheng Liu,et al. Early-Learning Regularization Prevents Memorization of Noisy Labels , 2020, NeurIPS.

[10] Levent Sagun,et al. Triple descent and the two kinds of overfitting: where and why do they appear? , 2020, NeurIPS.

[11] G. Biroli,et al. Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime , 2020, ICML.