Towards Probabilistic Verification of Machine Unlearning

Right to be forgotten, also known as the right to erasure, is the right of individuals to have their data erased from an entity storing it. The General Data Protection Regulation in the European Union legally solidified the status of this long held notion. As a consequence, there is a growing need for the development of mechanisms whereby users can verify if service providers comply with their deletion requests. In this work, we take the first step in proposing a formal framework to study the design of such verification mechanisms for data deletion requests -- also known as machine unlearning -- in the context of systems that provide machine learning as a service. We propose a backdoor-based verification mechanism and demonstrate its effectiveness in certifying data deletion with high confidence using the above framework. Our mechanism makes a novel use of backdoor attacks in ML as a basis for quantitatively inferring machine unlearning. In our mechanism, each user poisons part of its training data by injecting a user-specific backdoor trigger associated with a user-specific target label. The prediction of target labels on test samples with the backdoor trigger is then used as an indication of the user's data being used to train the ML model. We formalize the verification process as a hypothesis testing problem, and provide theoretical guarantees on the statistical power of the hypothesis test. We experimentally demonstrate that our approach has minimal effect on the machine learning service but provides high confidence verification of unlearning. We show that with a $30\%$ poison ratio and merely $20$ test queries, our verification mechanism has both false positive and false negative ratios below $10^{-5}$. Furthermore, we also show the effectiveness of our approach by testing it against an adaptive adversary that uses a state-of-the-art backdoor defense method.

[1]  Ramesh Karri,et al.  NNoculation: Broad Spectrum and Targeted Treatment of Backdoored DNNs , 2020, ArXiv.

[2]  Brendan Dolan-Gavitt,et al.  Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks , 2018, RAID.

[3]  Sanjam Garg,et al.  Formalizing Data Deletion in the Context of the Right to Be Forgotten , 2020, IACR Cryptol. ePrint Arch..

[4]  Nickolai Zeldovich,et al.  Vuvuzela: scalable private messaging resistant to traffic analysis , 2015, SOSP.

[5]  H. Robbins A Stochastic Approximation Method , 1951 .

[6]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Aleksander Madry,et al.  Label-Consistent Backdoor Attacks , 2019, ArXiv.

[9]  Reza Shokri,et al.  Bypassing Backdoor Detection Algorithms in Deep Learning , 2019, 2020 IEEE European Symposium on Security and Privacy (EuroS&P).

[10]  Sebastian Nowozin,et al.  Oblivious Multi-Party Machine Learning on Trusted Processors , 2016, USENIX Security Symposium.

[11]  David Lie,et al.  Machine Unlearning , 2019, 2021 IEEE Symposium on Security and Privacy (SP).

[12]  Roberto Tamassia,et al.  Authenticated Range \& Closest Point Queries in Zero-Knowledge , 2015, IACR Cryptol. ePrint Arch..

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Li Zilles,et al.  Machine, Unlearning , 2018 .

[15]  Nikita Borisov,et al.  Property Inference Attacks on Fully Connected Neural Networks using Permutation Invariant Representations , 2018, CCS.

[16]  Ben Y. Zhao,et al.  Latent Backdoor Attacks on Deep Neural Networks , 2019, CCS.

[17]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[18]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Siddharth Garg,et al.  BadNets: Evaluating Backdooring Attacks on Deep Neural Networks , 2019, IEEE Access.

[21]  F. MacWilliams,et al.  The Theory of Error-Correcting Codes , 1977 .

[22]  Michael T. Goodrich,et al.  Fully-Dynamic Verifiable Zero-Knowledge Order Queries for Network Data , 2015, IACR Cryptol. ePrint Arch..

[23]  Michael Backes,et al.  When Machine Unlearning Jeopardizes Privacy , 2020, ArXiv.

[24]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[25]  Xiangyu Zhang,et al.  ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation , 2019, CCS.

[26]  James Zou,et al.  Making AI Forget You: Data Deletion in Machine Learning , 2019, NeurIPS.

[27]  Michael Backes,et al.  Dynamic Backdoor Attacks Against Machine Learning Models , 2020, ArXiv.

[28]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[29]  Úlfar Erlingsson,et al.  The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks , 2018, USENIX Security Symposium.

[30]  Jerry Li,et al.  Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[31]  Junfeng Yang,et al.  Towards Making Systems Forget with Machine Unlearning , 2015, 2015 IEEE Symposium on Security and Privacy.

[32]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[33]  Matthias Zeppelzauer,et al.  Machine Unlearning: Linear Filtration for Logit-based Classifiers , 2020, ArXiv.

[34]  Vitaly Shmatikov,et al.  Auditing Data Provenance in Text-Generation Models , 2018, KDD.

[35]  Cordelia Schmid,et al.  Radioactive data: tracing through training , 2020, ICML.

[36]  Benny Pinkas,et al.  Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring , 2018, USENIX Security Symposium.

[37]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[38]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[39]  Patrick J. Grother,et al.  NIST Special Database 19 Handprinted Forms and Characters Database , 1995 .

[40]  Michael Hamburg,et al.  Meltdown: Reading Kernel Memory from User Space , 2018, USENIX Security Symposium.

[41]  Benjamin Edwards,et al.  Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering , 2018, SafeAI@AAAI.

[42]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[43]  Gregory Cohen,et al.  EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[44]  Daniel Gruss,et al.  Strong and Efficient Cache Side-Channel Protection using Hardware Transactional Memory , 2017, USENIX Security Symposium.

[45]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[46]  Janardhan Kulkarni,et al.  An Algorithmic Framework For Differentially Private Data Analysis on Trusted Processors , 2018, NeurIPS.

[47]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[48]  Tom Goldstein,et al.  Certified Data Removal from Machine Learning Models , 2020, ICML.

[49]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[50]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[51]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[52]  Sebastian Caldas,et al.  LEAF: A Benchmark for Federated Settings , 2018, ArXiv.

[53]  Michael Hamburg,et al.  Spectre Attacks: Exploiting Speculative Execution , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[54]  Hamed Pirsiavash,et al.  Hidden Trigger Backdoor Attacks , 2019, AAAI.

[55]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.