Training Data Protection with Compositional Diffusion Models

We introduce Compartmentalized Diffusion Models (CDM), a method to train different diffusion models (or prompts) on distinct data sources and arbitrarily compose them at inference time. The individual models can be trained in isolation, at different times, and on different distributions and domains and can be later composed to achieve performance comparable to a paragon model trained on all data simultaneously. Furthermore, each model only contains information about the subset of the data it was exposed to during training, enabling several forms of training data protection. In particular, CDMs are the first method to enable both selective forgetting and continual learning for large-scale diffusion models, as well as allowing serving customized models based on the user's access rights. CDMs also allow determining the importance of a subset of the data in generating particular samples.

[1]  A. Achille,et al.  SAFE: Machine Unlearning With Shard Graphs , 2023, ArXiv.

[2]  David Bau,et al.  Erasing Concepts from Diffusion Models , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Samuel L. Smith,et al.  Differentially Private Diffusion Models Generate Useful Synthetic Images , 2023, ArXiv.

[4]  S. Kakade,et al.  On Provable Copyright Protection for Generative Models , 2023, ICML.

[5]  A. Achille,et al.  À-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Haitao Zheng,et al.  GLAZE: Protecting Artists from Style Mimicry by Text-to-Image Models , 2023, USENIX Security Symposium.

[7]  Y. Matias,et al.  Dreamix: Video Diffusion Models are General Video Editors , 2023, ArXiv.

[8]  Marcus Soll,et al.  No Matter How You Slice It: Machine Unlearning with SISA Comes at the Expense of Minority Classes , 2023, 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML).

[9]  Florian Tramèr,et al.  Extracting Training Data from Diffusion Models , 2023, USENIX Security Symposium.

[10]  Vinayshekhar Bannihatti Kumar,et al.  Privacy Adhering Machine Un-learning in NLP , 2022, ArXiv.

[11]  Cheng Lu,et al.  DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models , 2022, ArXiv.

[12]  J. Guo,et al.  LegoNet: A Fast and Exact Unlearning Architecture , 2022, ArXiv.

[13]  Tianshi Cao,et al.  Differentially Private Diffusion Models , 2022, ArXiv.

[14]  R. Shokri,et al.  Forget Unlearning: Towards True Data-Deletion in Machine Learning , 2022, ICML.

[15]  Ricky T. Q. Chen,et al.  Flow Matching for Generative Modeling , 2022, ICLR.

[16]  David J. Fleet,et al.  Imagen Video: High Definition Video Generation with Diffusion Models , 2022, ArXiv.

[17]  Hui Li,et al.  ARCANE: An Efficient Architecture for Exact Machine Unlearning , 2022, IJCAI.

[18]  Cheng Lu,et al.  DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps , 2022, NeurIPS.

[19]  Tero Karras,et al.  Elucidating the Design Space of Diffusion-Based Generative Models , 2022, NeurIPS.

[20]  Prafulla Dhariwal,et al.  Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[21]  Serge J. Belongie,et al.  Visual Prompt Tuning , 2022, ECCV.

[22]  Yu-Xiang Wang,et al.  Mixed Differential Privacy in Computer Vision , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Aaron C. Courville,et al.  Generative Adversarial Networks , 2022, 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT).

[24]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Seth Neel,et al.  Adaptive Machine Unlearning , 2021, NeurIPS.

[26]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[27]  Ananda Theertha Suresh,et al.  Remember What You Want to Forget: Algorithms for Machine Unlearning , 2021, NeurIPS.

[28]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[29]  Ryan A. Rossi,et al.  Machine Unlearning via Algorithmic Stability , 2021, COLT.

[30]  Iain Murray,et al.  Maximum Likelihood Training of Score-Based Diffusion Models , 2021, NeurIPS.

[31]  Stefano Soatto,et al.  Mixed-Privacy Forgetting in Deep Networks , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Stefano Soatto,et al.  LQF: Linear Quadratic Fine-Tuning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[34]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[35]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2020, ICLR.

[36]  Seth Neel,et al.  Descent-to-Delete: Gradient-Based Methods for Machine Unlearning , 2020, ALT.

[37]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[38]  Stefano Ermon,et al.  Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.

[39]  Stefano Soatto,et al.  Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations , 2020, ECCV.

[40]  David Lie,et al.  Machine Unlearning , 2019, 2021 IEEE Symposium on Security and Privacy (SP).

[41]  Stefano Soatto,et al.  Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  L. V. D. Maaten,et al.  Certified Data Removal from Machine Learning Models , 2019, ICML.

[43]  James Zou,et al.  Making AI Forget You: Data Deletion in Machine Learning , 2019, NeurIPS.

[44]  Yang Song,et al.  Sliced Score Matching: A Scalable Approach to Density and Score Estimation , 2019, UAI.

[45]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[46]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[47]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[48]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[49]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[50]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[51]  C. V. Jawahar,et al.  Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  C. Villani Optimal Transport: Old and New , 2008 .

[54]  B. Anderson Reverse-time diffusion equation models , 1982 .

[55]  Edward Nelson Dynamical Theories of Brownian Motion , 1967 .

[56]  Chongxuan Li,et al.  All are Worth Words: a ViT Backbone for Score-based Diffusion Models , 2022, ArXiv.

[57]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .