Can Neural Network Memorization Be Localized?

Recent efforts at explaining the interplay of memorization and generalization in deep overparametrized networks have posited that neural networks $\textit{memorize}$"hard"examples in the final few layers of the model. Memorization refers to the ability to correctly predict on $\textit{atypical}$ examples of the training set. In this work, we show that rather than being confined to individual layers, memorization is a phenomenon confined to a small set of neurons in various layers of the model. First, via three experimental sources of converging evidence, we find that most layers are redundant for the memorization of examples and the layers that contribute to example memorization are, in general, not the final layers. The three sources are $\textit{gradient accounting}$ (measuring the contribution to the gradient norms from memorized and clean examples), $\textit{layer rewinding}$ (replacing specific model weights of a converged model with previous training checkpoints), and $\textit{retraining}$ (training rewound layers only on clean examples). Second, we ask a more generic question: can memorization be localized $\textit{anywhere}$ in a model? We discover that memorization is often confined to a small number of neurons or channels (around 5) of the model. Based on these insights we propose a new form of dropout -- $\textit{example-tied dropout}$ that enables us to direct the memorization of examples to an apriori determined set of neurons. By dropping out these neurons, we are able to reduce the accuracy on memorized examples from $100\%\to3\%$, while also reducing the generalization gap.

[1]  J. Z. Kolter,et al.  Characterizing Datapoints via Second-Split Forgetting , 2022, NeurIPS.

[2]  O. Shamir,et al.  Reconstructing Training Data from Trained Neural Networks , 2022, NeurIPS.

[3]  Zhihui Zhu,et al.  Robust Training under Label Noise by Over-parameterization , 2022, ICML.

[4]  David Bau,et al.  Locating and Editing Factual Associations in GPT , 2022, NeurIPS.

[5]  Behnam Neyshabur,et al.  Deep Learning Through the Lens of Example Difficulty , 2021, NeurIPS.

[6]  Hanlin Tang,et al.  On the geometry of generalization and memorization in deep neural networks , 2021, ICLR.

[7]  Michael C. Mozer,et al.  Understanding invariance via feedforward inversion of discriminatively trained classifiers , 2021, ICML.

[8]  Soham De,et al.  On the Origin of Implicit Regularization in Stochastic Gradient Descent , 2021, ICLR.

[9]  Tom B. Brown,et al.  Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.

[10]  Ankit Singh Rawat,et al.  Modifying Memories in Transformer Models , 2020, ArXiv.

[11]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[12]  Chiyuan Zhang,et al.  What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation , 2020, NeurIPS.

[13]  Sheng Liu,et al.  Early-Learning Regularization Prevents Memorization of Noisy Labels , 2020, NeurIPS.

[14]  Prateek Jain,et al.  The Pitfalls of Simplicity Bias in Neural Networks , 2020, NeurIPS.

[15]  Artem Babenko,et al.  Editable Neural Networks , 2020, ICLR.

[16]  S. Chatterjee Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization , 2020, ICLR.

[17]  David J. Schwab,et al.  The Early Phase of Neural Network Training , 2020, ICLR.

[18]  Ziheng Jiang,et al.  Characterizing Structural Regularities of Labeled Data in Overparameterized Models , 2020, ICML.

[19]  Úlfar Erlingsson,et al.  Distribution Density, Tails, and Outliers in Machine Learning: Metrics and Applications , 2019, ArXiv.

[20]  Philip M. Long,et al.  Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.

[21]  Vitaly Feldman,et al.  Does learning require memorization? a short tale about a long tail , 2019, STOC.

[22]  Bruno A. Olshausen,et al.  Superposition of many models into one , 2019, NeurIPS.

[23]  Samy Bengio,et al.  Are All Layers Created Equal? , 2019, J. Mach. Learn. Res..

[24]  Yoshua Bengio,et al.  An Empirical Study of Example Forgetting during Deep Neural Network Learning , 2018, ICLR.

[25]  Nathan Srebro,et al.  The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..

[26]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[27]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[28]  Tinne Tuytelaars,et al.  Expert Gate: Lifelong Learning with a Network of Experts , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[32]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[33]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[34]  H. Barlow,et al.  Single Units and Sensation: A Neuron Doctrine for Perceptual Psychology? , 1972, Perception.

[35]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[36]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .