PrivFL: Practical Privacy-preserving Federated Regressions on High-dimensional Data over Mobile Networks

Federated Learning (FL) enables a large number of users to jointly learn a shared machine learning (ML) model, coordinated by a centralized server, where the data is distributed across multiple devices. This approach enables the server or users to train and learn an ML model using gradient descent, while keeping all the training data on users' devices. We consider training an ML model over a mobile network where user dropout is a common phenomenon. Although federated learning was aimed at reducing data privacy risks, the ML model privacy has not received much attention. In this work, we present PrivFL, a privacy-preserving system for training (predictive) linear and logistic regression models and oblivious predictions in the federated setting, while guaranteeing data and model privacy as well as ensuring robustness to users dropping out in the network. We design two privacy-preserving protocols for training linear and logistic regression models based on an additive homomorphic encryption (HE) scheme and an aggregation protocol. Exploiting the training algorithm of federated learning, at the core of our training protocols is a secure multiparty global gradient computation on alive users' data. We analyze the security of our training protocols against semi-honest adversaries. As long as the aggregation protocol is secure under the aggregation privacy game and the additive HE scheme is semantically secure, PrivFL guarantees the users' data privacy against the server, and the server's regression model privacy against the users. We demonstrate the performance of PrivFL on real-world datasets and show its applicability in the federated learning system.

[1]  I-Cheng Yeh,et al.  The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients , 2009, Expert Syst. Appl..

[2]  Payman Mohassel,et al.  SecureML: A System for Scalable Privacy-Preserving Machine Learning , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[3]  Blaise Agüera y Arcas,et al.  Federated Learning of Deep Networks using Model Averaging , 2016, ArXiv.

[4]  Eike Kiltz,et al.  Secure Computation of the Mean and Related Statistics , 2005, IACR Cryptol. ePrint Arch..

[5]  Jerome P. Reiter,et al.  Privacy-Preserving Analysis of Vertically Partitioned Data Using Secure Matrix Products , 2009 .

[6]  Rui Zhang,et al.  A Hybrid Approach to Privacy-Preserving Federated Learning , 2018, Informatik Spektrum.

[7]  Yunghsiang Sam Han,et al.  Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification , 2004, SDM.

[8]  Richard Nock,et al.  Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption , 2017, ArXiv.

[9]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[10]  Xiaodong Lin,et al.  Secure Regression on Distributed Databases , 2005 .

[11]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[12]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[13]  Yao Lu,et al.  Oblivious Neural Network Predictions via MiniONN Transformations , 2017, IACR Cryptol. ePrint Arch..

[14]  Tianjian Chen,et al.  A Secure Federated Transfer Learning Framework , 2020, IEEE Intelligent Systems.

[15]  Dan Boneh,et al.  Prio: Private, Robust, and Scalable Computation of Aggregate Statistics , 2017, NSDI.

[16]  Xiaodong Lin,et al.  Regression on Distributed Databases via Secure Multi-Party Computation , 2004, DG.O.

[17]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[18]  Bart Goethals,et al.  On Private Scalar Product Computation for Privacy-Preserving Data Mining , 2004, ICISC.

[19]  Yang Liu,et al.  Secure Federated Transfer Learning , 2018, ArXiv.

[20]  Xiaoqian Jiang,et al.  Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation , 2018, IACR Cryptol. ePrint Arch..

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  S. Fienberg,et al.  Secure multiple linear regression based on homomorphic encryption , 2011 .

[23]  Marc Joye,et al.  Efficient Cryptosystems From 2k-th Power Residue Symbols , 2013, IACR Cryptol. ePrint Arch..

[24]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[25]  Pascal Van Hentenryck,et al.  Privacy-Preserving Federated Data Sharing , 2019, AAMAS.

[26]  K. Mandal,et al.  NIKE-based Fast Privacy-preserving High-dimensional Data Aggregation for Mobile Devices , 2018 .

[27]  Athanasios Tsanas,et al.  Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools , 2012 .

[28]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[29]  Dan Bogdanov,et al.  Rmind: A Tool for Cryptographically Secure Statistical Analysis , 2016, IEEE Transactions on Dependable and Secure Computing.

[30]  Peter Rindal,et al.  ABY3: A Mixed Protocol Framework for Machine Learning , 2018, IACR Cryptol. ePrint Arch..

[31]  Hadi Fanaee-T,et al.  Event labeling combining ensemble detectors and background knowledge , 2014, Progress in Artificial Intelligence.

[32]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[33]  Frederik Vercauteren,et al.  Privacy-preserving logistic regression training , 2018, BMC Medical Genomics.

[34]  Mariana Raykova,et al.  Privacy-Preserving Distributed Linear Regression on High-Dimensional Data , 2017, Proc. Priv. Enhancing Technol..

[35]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[36]  Stratis Ioannidis,et al.  Privacy-Preserving Ridge Regression on Hundreds of Millions of Records , 2013, 2013 IEEE Symposium on Security and Privacy.

[37]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[38]  Max A. Little,et al.  Accurate Telemonitoring of Parkinson's Disease Progression by Noninvasive Speech Tests , 2009, IEEE Transactions on Biomedical Engineering.

[39]  Wenliang Du,et al.  Privacy-preserving cooperative statistical analysis , 2001, Seventeenth Annual Computer Security Applications Conference.

[40]  Stefan Katzenbeisser,et al.  Two Is Not Enough: Privacy Assessment of Aggregation Schemes in Smart Metering , 2017, Proc. Priv. Enhancing Technol..

[41]  Ganesh Iyer,et al.  A Usability Evaluation of Tor Launcher , 2017, Proc. Priv. Enhancing Technol..

[42]  Michael Naehrig,et al.  ML Confidential: Machine Learning on Encrypted Data , 2012, ICISC.

[43]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[44]  Jung Hee Cheon,et al.  Logistic regression model training based on the approximate homomorphic encryption , 2018, BMC Medical Genomics.

[45]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[46]  Xiaodong Lin,et al.  Privacy preserving regression modelling via distributed computation , 2004, KDD.

[47]  Whitfield Diffie,et al.  New Directions in Cryptography , 1976, IEEE Trans. Inf. Theory.

[48]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[49]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[50]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[51]  Yoshinori Aono,et al.  Scalable and Secure Logistic Regression via Homomorphic Encryption , 2016, IACR Cryptol. ePrint Arch..

[52]  Emmanuel Bresson,et al.  A Simple Public-Key Cryptosystem with a Double Trapdoor Decryption Mechanism and Its Applications , 2003, ASIACRYPT.

[53]  Hui Li,et al.  Privacy-preserving logistic regression outsourcing in cloud computing , 2013, Int. J. Grid Util. Comput..