Privacy Leakage of Real-World Vertical Federated Learning

Federated learning enables mutually distrusting participants to collaboratively learn a distributed machine learning model without revealing anything but the model's output. Generic federated learning has been studied extensively, and several learning protocols, as well as open-source frameworks, have been developed. Yet, their over pursuit of computing efficiency and fast implementation might diminish the security and privacy guarantees of participant's training data, about which little is known thus far. In this paper, we consider an honest-but-curious adversary who participants in training a distributed ML model, does not deviate from the defined learning protocol, but attempts to infer private training data from the legitimately received information. In this setting, we design and implement two practical attacks, reverse sum attack and reverse multiplication attack, neither of which will affect the accuracy of the learned model. By empirically studying the privacy leakage of two learning protocols, we show that our attacks are (1) effective - the adversary successfully steal the private training data, even when the intermediate outputs are encrypted to protect data privacy; (2) evasive - the adversary's malicious behavior does not deviate from the protocol specification and deteriorate any accuracy of the target model; and (3) easy - the adversary needs little prior knowledge about the data distribution of the target participant. We also experimentally show that the leaked information is as effective as the raw training data through training an alternative classifier on the leaked information. We further discuss potential countermeasures and their challenges, which we hope may lead to several promising research directions.

[1]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[2]  Yoshinori Aono,et al.  Scalable and Secure Logistic Regression via Homomorphic Encryption , 2016, IACR Cryptol. ePrint Arch..

[3]  Payman Mohassel,et al.  SecureML: A System for Scalable Privacy-Preserving Machine Learning , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[4]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[5]  Amir Houmansadr,et al.  Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[6]  Yang Song,et al.  Beyond Inferring Class Representatives: User-Level Privacy Leakage From Federated Learning , 2018, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[7]  P. Lambin,et al.  Distributed learning: Developing a predictive model based on data from multiple hospitals without data leaving the hospital - A real life proof of concept. , 2016, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[8]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[9]  Tianjian Chen,et al.  Federated Machine Learning: Concept and Applications , 2019 .

[10]  Jaideep Vaidya,et al.  Privacy-Preserving SVM Classification on Vertically Partitioned Data , 2006, PAKDD.

[11]  Song Han,et al.  Deep Leakage from Gradients , 2019, NeurIPS.

[12]  Qiang Yang,et al.  SecureBoost: A Lossless Federated Learning Framework , 2019, IEEE Intelligent Systems.

[13]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[14]  Shiho Moriai,et al.  Privacy-Preserving Deep Learning: Revisited and Enhanced , 2017, ATIS.

[15]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[16]  Jonas Geiping,et al.  Inverting Gradients - How easy is it to break privacy in federated learning? , 2020, NeurIPS.

[17]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[18]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[19]  Peter Rindal,et al.  ABY3: A Mixed Protocol Framework for Machine Learning , 2018, IACR Cryptol. ePrint Arch..

[20]  Giuseppe Ateniese,et al.  Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning , 2017, CCS.

[21]  Chris Clifton,et al.  Privacy-Preserving Decision Trees over Vertically Partitioned Data , 2005, DBSec.

[22]  Wenliang Du,et al.  Building decision tree classifier on private data , 2002 .

[23]  Somesh Jha,et al.  Privacy-Preserving Ridge Regression with only Linearly-Homomorphic Encryption , 2018, IACR Cryptol. ePrint Arch..

[24]  Carl A. Gunter,et al.  Leaky Cauldron on the Dark Land: Understanding Memory Side-Channel Hazards in SGX , 2017, CCS.

[25]  Chris Clifton,et al.  Privacy Preserving Naïve Bayes Classifier for Vertically Partitioned Data , 2004, SDM.

[26]  Richard Nock,et al.  Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption , 2017, ArXiv.

[27]  Yanzhao Wu,et al.  A Framework for Evaluating Gradient Leakage Attacks in Federated Learning , 2020, ArXiv.

[28]  Vitaly Shmatikov,et al.  Exploiting Unintended Feature Leakage in Collaborative Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[29]  Shiho Moriai,et al.  Privacy-Preserving Deep Learning via Additively Homomorphic Encryption , 2018, IEEE Transactions on Information Forensics and Security.

[30]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[31]  I-Cheng Yeh,et al.  The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients , 2009, Expert Syst. Appl..

[32]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[33]  Zhicong Huang,et al.  Quantification of the Leakage in Federated Learning , 2019, ArXiv.