ViP: A Differentially Private Foundation Model for Computer Vision

Artificial intelligence (AI) has seen a tremendous surge in capabilities thanks to the use of foundation models trained on internet-scale data. On the flip side, the uncurated nature of internet-scale data also poses significant privacy and legal risks, as they often contain personal information or copyrighted material that should not be trained on without permission. In this work, we propose as a mitigation measure a recipe to train foundation vision models with differential privacy (DP) guarantee. We identify masked autoencoders as a suitable learning algorithm that aligns well with DP-SGD, and train ViP -- a Vision transformer with differential Privacy -- under a strict privacy budget of $\epsilon=8$ on the LAION400M dataset. We evaluate the quality of representation learned by ViP using standard downstream vision tasks; in particular, ViP achieves a (non-private) linear probing accuracy of $55.7\%$ on ImageNet, comparable to that of end-to-end trained AlexNet (trained and evaluated on ImageNet). Our result suggests that scaling to internet-scale data can be practical for private learning. Code is available at \url{https://github.com/facebookresearch/ViP-MAE}.

[1]  Kamalika Chaudhuri,et al.  Do SSL Models Have Déjà Vu? A Case of Unintended Memorization in Self-supervised Learning , 2023, ArXiv.

[2]  Ari S. Morcos,et al.  A Cookbook of Self-Supervised Learning , 2023, ArXiv.

[3]  Mark A. Lemley,et al.  Foundation Models and Fair Use , 2023, SSRN Electronic Journal.

[4]  Naman Goyal,et al.  LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[5]  Haoqi Fan,et al.  Scaling Language-Image Pre-Training via Masking , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  A. Torralba,et al.  Procedural Image Programs for Representation Learning , 2022, NeurIPS.

[7]  Michael E. Sander,et al.  Vision Transformers provably learn spatial structure , 2022, Neural Information Processing Systems.

[8]  Alexandre Sablayrolles,et al.  TAN without a burn: Scaling Laws of DP-SGD , 2022, ICML.

[9]  Zhiqi Bu,et al.  Scalable and Efficient Training of Large Convolutional Neural Networks with Differential Privacy , 2022, NeurIPS.

[10]  Samuel L. Smith,et al.  Unlocking High-Accuracy Differentially Private Image Classification through Scale , 2022, ArXiv.

[11]  Abhradeep Thakurta,et al.  Toward Training at ImageNet Scale with Differential Privacy , 2022, ArXiv.

[12]  Saining Xie,et al.  SLIP: Self-supervision meets Language-Image Pre-training , 2021, ECCV.

[13]  Han Hu,et al.  SimMIM: a Simple Framework for Masked Image Modeling , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Ross B. Girshick,et al.  Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jenia Jitsev,et al.  LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs , 2021, ArXiv.

[16]  Huseyin A. Inan,et al.  Differentially Private Fine-tuning of Language Models , 2021, ICLR.

[17]  Tatsunori B. Hashimoto,et al.  Large Language Models Can Be Strong Differentially Private Learners , 2021, ICLR.

[18]  Graham Cormode,et al.  Opacus: User-Friendly Differential Privacy Library in PyTorch , 2021, ArXiv.

[19]  Michael S. Bernstein,et al.  On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[20]  Badih Ghazi,et al.  Large-Scale Differentially Private BERT , 2021, EMNLP.

[21]  Li Dong,et al.  BEiT: BERT Pre-Training of Image Transformers , 2021, ICLR.

[22]  Yann LeCun,et al.  Barlow Twins: Self-Supervised Learning via Redundancy Reduction , 2021, ICML.

[23]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[24]  Yutaka Satoh,et al.  Pre-Training Without Natural Images , 2021, International Journal of Computer Vision.

[25]  Colin Raffel,et al.  Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.

[26]  Dan Boneh,et al.  Differentially Private Learning Needs Better Features (or Much More Data) , 2020, ICLR.

[27]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[29]  Razvan Pascanu,et al.  BYOL works even without batch statistics , 2020, ArXiv.

[30]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[31]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[32]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[33]  Dan Klein,et al.  Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers , 2020, ArXiv.

[34]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[35]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Li Zhang,et al.  Rényi Differential Privacy of the Sampled Gaussian Mechanism , 2019, ArXiv.

[37]  Gilles Barthe,et al.  Hypothesis Testing Interpretations and Renyi Differential Privacy , 2019, AISTATS.

[38]  Kaiming He,et al.  Rethinking ImageNet Pre-Training , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Yuning Jiang,et al.  Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.

[40]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[41]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[42]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[43]  Yang You,et al.  Large Batch Training of Convolutional Networks , 2017, 1708.03888.

[44]  Yang Song,et al.  The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[46]  Ilya Mironov,et al.  Rényi Differential Privacy , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).

[47]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Bolei Zhou,et al.  Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.

[49]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[50]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[51]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[52]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[53]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[54]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[55]  Anand D. Sarwate,et al.  Stochastic gradient descent with differentially private updates , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[56]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[57]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[58]  Naresh Sharma,et al.  Fundamental bound on the reliability of quantum information transmission , 2012, Physical review letters.

[59]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[62]  G. Crooks On Measures of Entropy and Information , 2015 .

[63]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .