论文信息 - FedEraser: Enabling Efficient Client-Level Data Removal from Federated Learning Models

FedEraser: Enabling Efficient Client-Level Data Removal from Federated Learning Models

Federated learning (FL) has recently emerged as a promising distributed machine learning (ML) paradigm. Practical needs of the "right to be forgotten" and countering data poisoning attacks call for efficient techniques that can remove, or unlearn, specific training data from the trained FL model. Existing unlearning techniques in the context of ML, however, are no longer in effect for FL, mainly due to the inherent distinction in the way how FL and ML learn from data. Therefore, how to enable efficient data removal from FL models remains largely under-explored. In this paper, we take the first step to fill this gap by presenting FedEraser, the first federated unlearning method-ology that can eliminate the influence of a federated client’s data on the global FL model while significantly reducing the time used for constructing the unlearned FL model. The basic idea of FedEraser is to trade the central server’s storage for unlearned model’s construction time, where FedEraser reconstructs the unlearned model by leveraging the historical parameter updates of federated clients that have been retained at the central server during the training process of FL. A novel calibration method is further developed to calibrate the retained updates, which are further used to promptly construct the unlearned model, yielding a significant speed-up to the reconstruction of the unlearned model while maintaining the model efficacy. Experiments on four realistic datasets demonstrate the effectiveness of FedEraser, with an expected speed-up of 4× compared with retraining from the scratch. We envision our work as an early step in FL towards compliance with legal and ethical criteria in a fair and transparent manner.

[1] Reza Shokri,et al. Comprehensive Privacy Analysis of Deep Learning: Stand-alone and Federated Learning under Passive and Active White-box Inference Attacks , 2018, ArXiv.

[2] Zhifei Zhang,et al. Analyzing User-Level Privacy Attack Against Federated Learning , 2020, IEEE Journal on Selected Areas in Communications.

[3] David Lie,et al. Machine Unlearning , 2019, 2021 IEEE Symposium on Security and Privacy (SP).

[4] Cordelia Schmid,et al. Radioactive data: tracing through training , 2020, ICML.

[5] Yinzhi Cao,et al. Practical Blind Membership Inference Attack via Differential Comparisons , 2021, ArXiv.

[6] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[7] Vitaly Shmatikov,et al. Exploiting Unintended Feature Leakage in Collaborative Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[8] Kamalika Chaudhuri,et al. Approximate Data Deletion from Machine Learning Models: Algorithms and Evaluations , 2020, ArXiv.

[9] Peter Richtárik,et al. Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[10] Ian Goodfellow,et al. Deep Learning with Differential Privacy , 2016, CCS.

[11] Anit Kumar Sahu,et al. Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[12] Stefano Soatto,et al. Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Vitaly Shmatikov,et al. Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[14] Paul Voigt,et al. The Eu General Data Protection Regulation (Gdpr): A Practical Guide , 2017 .

[15] Somesh Jha,et al. Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting , 2017, 2018 IEEE 31st Computer Security Foundations Symposium (CSF).

[16] L. V. D. Maaten,et al. Certified Data Removal from Machine Learning Models , 2019, ICML.

[18] Zhihua Zhang,et al. Wishart Mechanism for Differentially Private Principal Components Analysis , 2015, AAAI.

[19] Anand D. Sarwate,et al. Distributed Differentially Private Algorithms for Matrix and Tensor Factorization , 2018, IEEE Journal of Selected Topics in Signal Processing.

[20] James Zou,et al. Making AI Forget You: Data Deletion in Machine Learning , 2019, NeurIPS.

[21] Junfeng Yang,et al. Towards Making Systems Forget with Machine Unlearning , 2015, 2015 IEEE Symposium on Security and Privacy.

[22] Cynthia Dwork,et al. Differential Privacy , 2006, ICALP.

[23] Ivan Beschastnikh,et al. The Limitations of Federated Learning in Sybil Settings , 2020, RAID.

[24] Matthias Zeppelzauer,et al. Machine Unlearning: Linear Filtration for Logit-based Classifiers , 2020, ArXiv.

[25] Mario Fritz,et al. ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models , 2018, NDSS.

[26] Bo Li,et al. DBA: Distributed Backdoor Attacks against Federated Learning , 2020, ICLR.

[27] David M. Sommer,et al. Towards Probabilistic Verification of Machine Unlearning , 2020, ArXiv.

[28] Vitaly Shmatikov,et al. How To Backdoor Federated Learning , 2018, AISTATS.

[29] Xiaoqiang Ma,et al. Keep Your Data Locally: Federated-Learning-Based Data Privacy Preservation in Edge Computing , 2021, IEEE Network.

[30] Tianjian Chen,et al. Federated Machine Learning: Concept and Applications , 2019 .

[31] Elizabeth Harding,et al. Understanding the scope and impact of the California Consumer Privacy Act of 2018 , 2019, Journal of Data Protection & Privacy.

[32] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[33] Ashwin Machanavajjhala,et al. Differentially Private Regression Diagnostics , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[34] Amir Houmansadr,et al. Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[35] Kai Peng,et al. SocInf: Membership Inference Attacks on Social Media Health Data With Machine Learning , 2019, IEEE Transactions on Computational Social Systems.