Eavesdrop the Composition Proportion of Training Labels in Federated Learning

Federated learning (FL) has recently emerged as a new form of collaborative machine learning, where a common model can be learned while keeping all the training data on local devices. Although it is designed for enhancing the data privacy, we demonstrated in this paper a new direction in inference attacks in the context of FL, where valuable information about training data can be obtained by adversaries with very limited power. In particular, we proposed three new types of attacks to exploit this vulnerability. The first type of attack, Class Sniffing, can detect whether a certain label appears in training. The other two types of attacks can determine the quantity of each label, i.e., Quantity Inference attack determines the composition proportion of the training label owned by the selected clients in a single round, while Whole Determination attack determines that of the whole training process. We evaluated our attacks on a variety of tasks and datasets with different settings, and the corresponding results showed that our attacks work well generally. Finally, we analyzed the impact of major hyper-parameters to our attacks and discussed possible defenses.

[1]  William J. Dally,et al.  Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.

[2]  Prateek Saxena,et al.  Auror: defending against poisoning attacks in collaborative deep learning systems , 2016, ACSAC.

[3]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[4]  Shiho Moriai,et al.  Privacy-Preserving Deep Learning: Revisited and Enhanced , 2017, ATIS.

[5]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[6]  Reza Shokri,et al.  Comprehensive Privacy Analysis of Deep Learning: Stand-alone and Federated Learning under Passive and Active White-box Inference Attacks , 2018, ArXiv.

[7]  Giovanni Felici,et al.  Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers , 2013, Int. J. Secur. Networks.

[8]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Vitaly Shmatikov,et al.  Exploiting Unintended Feature Leakage in Collaborative Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[10]  Li Zhang,et al.  Learning Differentially Private Language Models Without Losing Accuracy , 2017, ArXiv.

[11]  Cispa Helmholtz Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning , 2019 .

[12]  Giuseppe Ateniese,et al.  Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning , 2017, CCS.

[13]  Ahmad-Reza Sadeghi,et al.  DIoT: A Self-learning System for Detecting Compromised IoT Devices , 2018 .

[14]  Davide Anguita,et al.  A Public Domain Dataset for Human Activity Recognition using Smartphones , 2013, ESANN.

[15]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[16]  Tassilo Klein,et al.  Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.

[17]  Moran Baruch,et al.  A Little Is Enough: Circumventing Defenses For Distributed Learning , 2019, NeurIPS.

[18]  Samuel Marchal,et al.  DÏoT: A Federated Self-learning Anomaly Detection System for IoT , 2018, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[19]  Emiliano De Cristofaro,et al.  Knock Knock, Who's There? Membership Inference on Aggregate Location Data , 2017, NDSS.

[20]  Nikolaos G. Bourbakis,et al.  A Survey on Wearable Sensor-Based Systems for Health Monitoring and Prognosis , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[21]  Yaoliang Yu,et al.  Petuum: A New Platform for Distributed Machine Learning on Big Data , 2015, IEEE Trans. Big Data.

[22]  Walid Saad,et al.  Distributed Federated Learning for Ultra-Reliable Low-Latency Vehicular Communications , 2018, IEEE Transactions on Communications.

[23]  Amir Houmansadr,et al.  Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[24]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[25]  Prateek Mittal,et al.  Analyzing Federated Learning through an Adversarial Lens , 2018, ICML.

[26]  Payman Mohassel,et al.  SecureML: A System for Scalable Privacy-Preserving Machine Learning , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[27]  Trishul M. Chilimbi,et al.  Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.

[28]  Harald Kittler,et al.  Descriptor : The HAM 10000 dataset , a large collection of multi-source dermatoscopic images of common pigmented skin lesions , 2018 .

[29]  Hao Deng,et al.  LoAdaBoost: Loss-Based AdaBoost Federated Machine Learning on medical Data , 2018, ArXiv.

[30]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[31]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[32]  Li Huang,et al.  LoAdaBoost: Loss-based AdaBoost federated machine learning with reduced computational complexity on IID and non-IID intensive care data , 2018, PloS one.

[33]  Michael I. Jordan,et al.  SparkNet: Training Deep Networks in Spark , 2015, ICLR.

[34]  Yang Song,et al.  Beyond Inferring Class Representatives: User-Level Privacy Leakage From Federated Learning , 2018, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[35]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[38]  Walid Saad,et al.  Federated Learning for Ultra-Reliable Low-Latency V2V Communications , 2018, 2018 IEEE Global Communications Conference (GLOBECOM).

[39]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[40]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[41]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[42]  Ivan Beschastnikh,et al.  Mitigating Sybils in Federated Learning Poisoning , 2018, ArXiv.

[43]  Emiliano De Cristofaro,et al.  LOGAN: Membership Inference Attacks Against Generative Models , 2017, Proc. Priv. Enhancing Technol..

[44]  Saeed Mahloujifar,et al.  Multi-party Poisoning through Generalized p-Tampering , 2018, IACR Cryptol. ePrint Arch..

[45]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[46]  Wenqi Wei,et al.  Demystifying Membership Inference Attacks in Machine Learning as a Service , 2019, IEEE Transactions on Services Computing.

[47]  Gang Wang,et al.  LEMNA: Explaining Deep Learning based Security Applications , 2018, CCS.

[48]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[49]  Mikhail Belkin,et al.  Learning privately from multiparty data , 2016, ICML.

[50]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[51]  Nikita Borisov,et al.  Property Inference Attacks on Fully Connected Neural Networks using Permutation Invariant Representations , 2018, CCS.

[52]  Swaroop Ramaswamy,et al.  Federated Learning for Emoji Prediction in a Mobile Keyboard , 2019, ArXiv.