Déjà Vu: an empirical evaluation of the memorization properties of ConvNets

Convolutional neural networks memorize part of their training data, which is why strategies such as data augmentation and drop-out are employed to mitigate overfitting. This paper considers the related question of "membership inference", where the goal is to determine if an image was used during training. We consider it under three complementary angles. We show how to detect which dataset was used to train a model, and in particular whether some validation images were used at train time. We then analyze explicit memorization and extend classical random label experiments to the problem of learning a model that predicts if an image belongs to an arbitrary set. Finally, we propose a new approach to infer membership when a few of the top layers are not available or have been fine-tuned, and show that lower layers still carry information about the training samples. To support our findings, we conduct large-scale experiments on Imagenet and subsets of YFCC-100M with modern architectures such as VGG and Resnet.

[1]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[2]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[3]  Cordelia Schmid,et al.  Transformation Pursuit for Image Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Emiliano De Cristofaro,et al.  LOGAN: Membership Inference Attacks Against Generative Models , 2017, Proc. Priv. Enhancing Technol..

[5]  Raef Bassily,et al.  Algorithmic stability for adaptive data analysis , 2015, STOC.

[6]  Emiliano De Cristofaro,et al.  LOGAN: Evaluating Privacy Leakage of Generative Models Using Generative Adversarial Networks , 2017, ArXiv.

[7]  Andrea Vedaldi,et al.  Deep Image Prior , 2017, International Journal of Computer Vision.

[8]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Giovanni Felici,et al.  Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers , 2013, Int. J. Secur. Networks.

[11]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[12]  Michael Rabbat,et al.  Memory Vectors for Similarity Search in High-Dimensional Spaces , 2014, IEEE Transactions on Big Data.

[13]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[14]  Andrea Vedaldi,et al.  Understanding Image Representations by Measuring Their Equivariance and Equivalence , 2014, International Journal of Computer Vision.

[15]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Convolutional Neural Networks , 2014, NIPS.

[18]  Sébastien Gambs,et al.  Challenging Differential Privacy: The Case of Non-interactive Mechanisms , 2014, ESORICS.

[19]  Bharath Hariharan,et al.  Low-Shot Visual Recognition by Shrinking and Hallucinating Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[21]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[22]  Adrian Perrig,et al.  This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. Déjà Vu: A User Study Using Images for Authentication , 2000 .

[23]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[24]  Brian D. Ziebart,et al.  ADA: A Game-Theoretic Perspective on Data Augmentation for Object Detection , 2017 .

[25]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[26]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[27]  Fabio Roli,et al.  Security Evaluation of Support Vector Machines in Adversarial Environments , 2014, ArXiv.

[28]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[29]  Barbara Caputo,et al.  A Deeper Look at Dataset Bias , 2015, Domain Adaptation in Computer Vision Applications.

[30]  Stephen E. Fienberg,et al.  On-Average KL-Privacy and Its Equivalence to Generalization for Max-Entropy Mechanisms , 2016, PSD.

[31]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[32]  Cordelia Schmid,et al.  Convolutional Patch Representations for Image Retrieval: An Unsupervised Approach , 2016, International Journal of Computer Vision.

[33]  Yuhong Yang,et al.  Information Theory, Inference, and Learning Algorithms , 2005 .

[34]  Kai Chen,et al.  Understanding Membership Inferences on Well-Generalized Learning Models , 2018, ArXiv.

[35]  Personnaz,et al.  Collective computational properties of neural networks: New learning mechanisms. , 1986, Physical review. A, General physics.

[36]  Martial Hebert,et al.  Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Tony A. Plate,et al.  Holographic reduced representations , 1995, IEEE Trans. Neural Networks.

[39]  Armand Joulin,et al.  Unsupervised Learning by Predicting Noise , 2017, ICML.

[40]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[41]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[42]  Cordelia Schmid,et al.  Evaluation of GIST descriptors for web-scale image search , 2009, CIVR '09.

[43]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[44]  Somesh Jha,et al.  Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting , 2017, 2018 IEEE 31st Computer Security Foundations Symposium (CSF).

[45]  W. Feller On the Kolmogorov–Smirnov Limit Theorems for Empirical Distributions , 1948 .

[46]  Martín Abadi,et al.  On the Protection of Private Information in Machine Learning Systems: Two Recent Approches , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).

[47]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[48]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[49]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[50]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[51]  Úlfar Erlingsson,et al.  The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets , 2018, ArXiv.

[52]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Ling Huang,et al.  Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.