When the Curious Abandon Honesty: Federated Learning Is Not Private

In federated learning (FL), data does not leave personal devices when they are jointly training a machine learning model. Instead, these devices share gradients, parameters, or other model updates, with a central party (e.g., a company) coordinating the training. Because data never"leaves"personal devices, FL is often presented as privacy-preserving. Yet, recently it was shown that this protection is but a thin facade, as even a passive, honest-but-curious attacker observing gradients can reconstruct data of individual users contributing to the protocol. In this work, we show a novel data reconstruction attack which allows an active and dishonest central party to efficiently extract user data from the received gradients. While prior work on data reconstruction in FL relies on solving computationally expensive optimization problems or on making easily detectable modifications to the shared model's architecture or parameters, in our attack the central party makes inconspicuous changes to the shared model's weights before sending them out to the users. We call the modified weights of our attack trap weights. Our active attacker is able to recover user data perfectly, i.e., with zero error, even when this data stems from the same class. Recovery comes with near-zero costs: the attack requires no complex optimization objectives. Instead, our attacker exploits inherent data leakage from model gradients and simply amplifies this effect by maliciously altering the weights of the shared model through the trap weights. These specificities enable our attack to scale to fully-connected and convolutional deep neural networks trained with large mini-batches of data. For example, for the high-dimensional vision dataset ImageNet, we perfectly reconstruct more than 50% of the training data points from mini-batches as large as 100 data points.

[1]  Aidmar Wainakh,et al.  Federated Learning Attacks Revisited: A Critical Discussion of Gaps, Assumptions, and Evaluation Setups , 2021, Sensors.

[2]  T. Goldstein,et al.  Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification , 2022, ICML.

[3]  G. Ateniese,et al.  Eluding Secure Aggregation in Federated Learning via Model Inconsistency , 2021, CCS.

[4]  T. Goldstein,et al.  Robbing the Fed: Directly Obtaining Private Data in Federated Learning with Modified Models , 2021, ArXiv.

[5]  Peter Kairouz,et al.  The Skellam Mechanism for Differentially Private Federated Learning , 2021, NeurIPS.

[6]  Petr Musilek,et al.  Distributed Learning Applications in Power Systems: A Review of Methods, Gaps, and Challenges , 2021, Energies.

[7]  Chaoyang He,et al.  Federated Learning for Internet of Things , 2021, SenSys.

[8]  Diego Perino,et al.  PPFL: privacy-preserving federated learning with trusted execution environments , 2021, MobiSys.

[9]  Murat A. Erdogdu,et al.  Manipulating SGD with Data Ordering Attacks , 2021, NeurIPS.

[10]  Aruna Seneviratne,et al.  Federated Learning for Internet of Things: A Comprehensive Survey , 2021, IEEE Communications Surveys & Tutorials.

[11]  Pavlo Molchanov,et al.  See through Gradients: Image Batch Recovery via GradInversion , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yi Wang,et al.  Electricity Consumer Characteristics Identification: A Federated Learning Approach , 2021, IEEE Transactions on Smart Grid.

[13]  P. Kairouz,et al.  The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation , 2021, ICML.

[14]  Jingwei Sun,et al.  Soteria: Provable Defense against Privacy Leakage in Federated Learning from Representation Perspective , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Dan Boneh,et al.  Differentially Private Learning Needs Better Features (or Much More Data) , 2020, ICLR.

[16]  Walid Saad,et al.  Federated Learning for Internet of Things: Recent Advances, Taxonomy, and Open Challenges , 2020, IEEE Communications Surveys & Tutorials.

[17]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[18]  Wazir Zada Khan,et al.  A Framework for Edge-Assisted Healthcare Data Analytics using Federated Learning , 2020, 2020 IEEE International Conference on Big Data (Big Data).

[19]  H. Brendan McMahan,et al.  Training Production Language Models without Memorizing User Data , 2020, ArXiv.

[20]  Wenqi Wei,et al.  LDP-Fed: federated learning with local differential privacy , 2020, EdgeSys@EuroSys.

[21]  Michael Moeller,et al.  Inverting Gradients - How easy is it to break privacy in federated learning? , 2020, NeurIPS.

[22]  Emmanuel Stapf,et al.  Trusted Execution Environments: Properties, Applications, and Challenges , 2020, IEEE Security & Privacy.

[23]  Vitaly Shmatikov,et al.  Salvaging Federated Learning by Local Adaptation , 2020, ArXiv.

[24]  Bo Zhao,et al.  iDLG: Improved Deep Leakage from Gradients , 2020, ArXiv.

[25]  H. Vincent Poor,et al.  Federated Learning With Differential Privacy: Algorithms and Performance Analysis , 2019, IEEE Transactions on Information Forensics and Security.

[26]  Ghada Dessouky,et al.  HybCache: Hybrid Side-Channel-Resilient Caches for Trusted Execution Environments , 2019, USENIX Security Symposium.

[27]  Andreas Haeberlen,et al.  Orchard: Differentially Private Analytics at Scale , 2020, OSDI.

[28]  Andreas Haeberlen,et al.  Honeycrisp: large-scale differentially private aggregation without a trusted core , 2019, SOSP.

[29]  Song Han,et al.  Deep Leakage from Gradients , 2019, NeurIPS.

[30]  Dietrich Klakow,et al.  Adversarial Initialization - when your network performs the way I want , 2019, ArXiv.

[31]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[32]  Rui Zhang,et al.  A Hybrid Approach to Privacy-Preserving Federated Learning , 2018, Informatik Spektrum.

[33]  Yang Song,et al.  Beyond Inferring Class Representatives: User-Level Privacy Leakage From Federated Learning , 2018, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[34]  Cheng-Zhong Xu,et al.  Dynamic Channel Pruning: Feature Boosting and Suppression , 2018, ICLR.

[35]  Vitaly Shmatikov,et al.  Exploiting Unintended Feature Leakage in Collaborative Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[36]  Gaurav Kapoor,et al.  Protection Against Reconstruction and Its Applications in Private Federated Learning , 2018, ArXiv.

[37]  Nikita Borisov,et al.  Property Inference Attacks on Fully Connected Neural Networks using Permutation Invariant Representations , 2018, CCS.

[38]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[39]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[40]  Úlfar Erlingsson,et al.  Prochlo: Strong Privacy for Analytics in the Crowd , 2017, SOSP.

[41]  Shiho Moriai,et al.  Privacy-Preserving Deep Learning: Revisited and Enhanced , 2017, ATIS.

[42]  Gregory Cohen,et al.  EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[43]  Shiliang Pu,et al.  All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Giuseppe Ateniese,et al.  Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning , 2017, CCS.

[45]  Martín Abadi,et al.  Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[46]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[47]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[48]  Valtteri Niemi,et al.  Practical Attacks Against Privacy and Availability in 4G/LTE Mobile Communication Systems , 2015, NDSS.

[49]  Guillermo Sapiro,et al.  Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? , 2015, IEEE Transactions on Signal Processing.

[50]  Manfred Reichert,et al.  Mobile Crowd Sensing in Clinical and Psychological Trials -- A Case Study , 2015, 2015 IEEE 28th International Symposium on Computer-Based Medical Systems.

[51]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[52]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[53]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[54]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[55]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[56]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[58]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[59]  Stephen I. Gallant,et al.  Perceptron-based learning algorithms , 1990, IEEE Trans. Neural Networks.