Verifiable and Provably Secure Machine Unlearning

Machine unlearning aims to remove points from the training dataset of a machine learning model after training; for example when a user requests their data to be deleted. While many machine unlearning methods have been proposed, none of them enable users to audit the procedure. Furthermore, recent work shows a user is unable to verify if their data was unlearnt from an inspection of the model alone. Rather than reasoning about model parameters, we propose to view verifiable unlearning as a security problem. To this end, we present the first cryptographic definition of verifiable unlearning to formally capture the guarantees of a machine unlearning system. In this framework, the server first computes a proof that the model was trained on a dataset $D$. Given a user data point $d$ requested to be deleted, the server updates the model using an unlearning algorithm. It then provides a proof of the correct execution of unlearning and that $d \notin D'$, where $D'$ is the new training dataset. Our framework is generally applicable to different unlearning techniques that we abstract as admissible functions. We instantiate the framework, based on cryptographic assumptions, using SNARKs and hash chains. Finally, we implement the protocol for three different unlearning techniques (retraining-based, amnesiac, and optimization-based) to validate its feasibility for linear regression, logistic regression, and neural networks.

[1]  K. Rieck,et al.  Machine Unlearning of Features and Labels , 2021, NDSS.

[2]  Shruti Tople,et al.  SoK: Let The Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning , 2022, ArXiv.

[3]  Yuefeng Du,et al.  Proof of Unlearning: Definitions and Instantiation , 2022, ArXiv.

[4]  Tatsunori B. Hashimoto,et al.  Scaling up Trustless DNN Inference with Zero-Knowledge Proofs , 2022, ArXiv.

[5]  Minjoon Seo,et al.  Knowledge Unlearning for Mitigating Privacy Risks in Language Models , 2022, ACL.

[6]  Xingjun Ma,et al.  VeriFi: Towards Verifiable Federated Unlearning , 2022, ArXiv.

[7]  Alex Ozdemir,et al.  CirC: Compiler infrastructure for proof systems, software verification, and more , 2022, 2022 IEEE Symposium on Security and Privacy (SP).

[8]  Prashant Nalini Vasudevan,et al.  Deletion Inference, Reconstruction, and Compliance in Machine (Un)Learning , 2022, Proc. Priv. Enhancing Technol..

[9]  Nicolas Papernot,et al.  On the Necessity of Auditable Algorithmic Definitions for Machine Unlearning , 2021, USENIX Security Symposium.

[10]  Nicolas Papernot,et al.  Unrolling SGD: Understanding Factors Influencing Machine Unlearning , 2021, 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P).

[11]  Yang Zhang,et al.  Graph Unlearning , 2021, CCS.

[12]  Jason H. Moore,et al.  PMLB v1.0: an open source dataset collection for benchmarking machine learning methods , 2020, ArXiv.

[13]  M. Zeppelzauer,et al.  Machine unlearning: linear filtration for logit-based classifiers , 2020, Machine Learning.

[14]  Tianyi Liu,et al.  zkCNN: Zero Knowledge Proofs for Convolutional Neural Network Predictions and Accuracy , 2021, IACR Cryptol. ePrint Arch..

[15]  Murat A. Erdogdu,et al.  Manipulating SGD with Data Ordering Attacks , 2021, NeurIPS.

[16]  Ananda Theertha Suresh,et al.  Remember What You Want to Forget: Algorithms for Machine Unlearning , 2021, NeurIPS.

[17]  Shai Avidan,et al.  Reducing ReLU Count for Privacy-Preserving CNN Speedup , 2021, ArXiv.

[18]  Vitaly Feldman,et al.  When is memorization of irrelevant training data necessary for high-accuracy learning? , 2020, STOC.

[19]  Vijay Ganesh,et al.  Amnesiac Machine Learning , 2020, AAAI.

[20]  Seth Neel,et al.  Descent-to-Delete: Gradient-Based Methods for Machine Unlearning , 2020, ALT.

[21]  David Lie,et al.  Machine Unlearning , 2019, 2021 IEEE Symposium on Security and Privacy (SP).

[22]  Shmuel Peleg,et al.  Crypto-Oriented Neural Architecture Design , 2019, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Xiaodong Lin,et al.  VeriML: Enabling Integrity Assurances and Fair Payments for Machine Learning as a Service , 2019, IEEE Transactions on Parallel and Distributed Systems.

[24]  Andrew J. Blumberg,et al.  Efficient Representation of Numerical Optimization Problems for SNARKs , 2021, IACR Cryptol. ePrint Arch..

[25]  Jonathan Katz,et al.  Mystique: Efficient Conversions for Zero-Knowledge Proofs with Applications to Machine Learning , 2021, IACR Cryptol. ePrint Arch..

[26]  Shumo Chu,et al.  ZEN: Efficient Zero-Knowledge Proofs for Neural Networks , 2021, IACR Cryptol. ePrint Arch..

[27]  Arnab Roy,et al.  Poseidon: A New Hash Function for Zero-Knowledge Proof Systems , 2021, USENIX Security Symposium.

[28]  Yinjun Wu,et al.  DeltaGrad: Rapid retraining of machine learning models , 2020, ICML.

[29]  Srinath T. V. Setty,et al.  Replicated state machines without replicated execution , 2020, 2020 IEEE Symposium on Security and Privacy (SP).

[30]  David M. Sommer,et al.  Towards Probabilistic Verification of Machine Unlearning , 2020, ArXiv.

[31]  Sanjam Garg,et al.  Formalizing Data Deletion in the Context of the Right to Be Forgotten , 2020, IACR Cryptol. ePrint Arch..

[32]  Stefano Soatto,et al.  Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  L. V. D. Maaten,et al.  Certified Data Removal from Machine Learning Models , 2019, ICML.

[34]  Matt Fredrikson,et al.  Stolen Memories: Leveraging Model Memorization for Calibrated White-Box Membership Inference , 2019, USENIX Security Symposium.

[35]  Vitaly Feldman,et al.  Does learning require memorization? a short tale about a long tail , 2019, STOC.

[36]  Hyunok Oh,et al.  vCNN: Verifiable Convolutional Neural Network , 2020, IACR Cryptol. ePrint Arch..

[37]  Srinath T. V. Setty,et al.  Spartan: Efficient and general-purpose zkSNARKs without trusted setup , 2020, IACR Cryptol. ePrint Arch..

[38]  Eli Ben-Sasson,et al.  Scalable Zero Knowledge with No Trusted Setup , 2019, CRYPTO.

[39]  James Y. Zou,et al.  Data Shapley: Equitable Valuation of Data for Machine Learning , 2019, ICML.

[40]  Dan Boneh,et al.  Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware , 2018, ICLR.

[41]  Úlfar Erlingsson,et al.  The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks , 2018, USENIX Security Symposium.

[42]  Dan Boneh,et al.  Scaling Verifiable Computation Using Efficient Set Accumulators , 2019, IACR Cryptol. ePrint Arch..

[43]  Jonathan Lee,et al.  Proving the correct execution of concurrent services in zero-knowledge , 2018, IACR Cryptol. ePrint Arch..

[44]  Rodrigo Bruno,et al.  Graviton: Trusted Execution Environments on GPUs , 2018, OSDI.

[45]  Shweta Shinde,et al.  Privado: Practical and Secure DNN Inference with Enclaves , 2018 .

[46]  Jung Hee Cheon,et al.  Logistic regression model training based on the approximate homomorphic encryption , 2018, BMC Medical Genomics.

[47]  Stefan Tai,et al.  ZoKrates - Scalable Privacy-Preserving Off-Chain Computations , 2018, 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData).

[48]  J. Murphy The General Data Protection Regulation (GDPR) , 2018, Irish medical journal.

[49]  Elaine Shi,et al.  xJsnark: A Framework for Efficient Verifiable Computation , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[50]  Xiaoqian Jiang,et al.  Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation , 2018, IACR Cryptol. ePrint Arch..

[51]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[52]  Ion Stoica,et al.  DIZK: A Distributed Zero Knowledge Proof System , 2018, IACR Cryptol. ePrint Arch..

[53]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[54]  Carl A. Gunter,et al.  Leaky Cauldron on the Dark Land: Understanding Memory Side-Channel Hazards in SGX , 2017, CCS.

[55]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[56]  Martin R. Albrecht,et al.  MiMC: Efficient Encryption and Cryptographic Hashing with Minimal Multiplicative Complexity , 2016, ASIACRYPT.

[57]  Cédric Fournet,et al.  Hash First, Argue Later: Adaptive Verifiable Computations on Outsourced Data , 2016, CCS.

[58]  Sebastian Nowozin,et al.  Oblivious Multi-Party Machine Learning on Trusted Processors , 2016, USENIX Security Symposium.

[59]  Jens Groth,et al.  On the Size of Pairing-Based Non-interactive Arguments , 2016, EUROCRYPT.

[60]  Michael Rosenstock,et al.  Is there a ‘right to be forgotten’ in Canada’s Personal Information Protection and Electronic Documents Act (PIPEDA)? , 2016 .

[61]  Junfeng Yang,et al.  Towards Making Systems Forget with Machine Unlearning , 2015, 2015 IEEE Symposium on Security and Privacy.

[62]  Eli Ben-Sasson,et al.  Succinct Non-Interactive Zero Knowledge for a von Neumann Architecture , 2014, USENIX Security Symposium.

[63]  Benjamin Braun,et al.  Verifying computations with state , 2013, IACR Cryptol. ePrint Arch..

[64]  Eli Ben-Sasson,et al.  SNARKs for C: Verifying Program Executions Succinctly and in Zero Knowledge , 2013, CRYPTO.

[65]  Carlos V. Rozas,et al.  Innovative instructions and software model for isolated execution , 2013, HASP '13.

[66]  Eli Ben-Sasson,et al.  On the concrete efficiency of probabilistically-checkable proofs , 2013, STOC '13.

[67]  Srinath T. V. Setty,et al.  A Hybrid Architecture for Interactive Verifiable Computation , 2013, 2013 IEEE Symposium on Security and Privacy.

[68]  Craig Gentry,et al.  Pinocchio: Nearly Practical Verifiable Computation , 2013, 2013 IEEE Symposium on Security and Privacy.

[69]  Benjamin Braun,et al.  Taking Proof-Based Verified Computation a Few Steps Closer to Practicality , 2012, USENIX Security Symposium.

[70]  Srinath T. V. Setty,et al.  Making argument systems for outsourced computation practical (sometimes) , 2012, NDSS.

[71]  Nir Bitansky,et al.  From extractable collision resistance to succinct non-interactive arguments of knowledge, and back again , 2012, ITCS '12.

[72]  Gene Tsudik,et al.  Secure Code Update for Embedded Devices via Proofs of Secure Erasure , 2010, ESORICS.

[73]  Craig Gentry,et al.  Non-interactive Verifiable Computing: Outsourcing Computation to Untrusted Workers , 2010, CRYPTO.

[74]  H. Elsheshtawy,et al.  Personal Information Protection and Electronic Documents Act , 2015 .

[75]  Eli Ben-Sasson,et al.  Short PCPs verifiable in polylogarithmic time , 2005, 20th Annual IEEE Conference on Computational Complexity (CCC'05).

[76]  Eli Ben-Sasson,et al.  Robust pcps of proximity, shorter pcps and applications to coding , 2004, STOC '04.

[77]  Sanjeev Arora,et al.  Probabilistic checking of proofs: a new characterization of NP , 1998, JACM.

[78]  Silvio Micali,et al.  The knowledge complexity of interactive proof-systems , 1985, STOC '85.