FedV: Privacy-Preserving Federated Learning over Vertically Partitioned Data

Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML) models among multiple parties to keep their data private and only model updates are shared. Most existing approaches have focused on horizontal FL, while many real scenarios follow a vertically-partitioned FL setup, where a complete feature set is formed only when all the datasets from the parties are combined, and the labels are only available to a single party. Privacy-preserving vertical FL is challenging because complete sets of labels and features are not owned by one entity. Existing approaches for vertical FL require multiple peer-to-peer communications among parties, leading to lengthy training times, and are restricted to (approximated) linear models and just two parties. To close this gap, we propose FedV, a framework for secure gradient computation in vertical settings for several widely used ML models such as linear models, logistic regression, and support vector machines. FedV removes the need for peer-to-peer communication among parties by using functional encryption schemes and works for larger and changing sets of parties. We empirically demonstrate the applicability for multiple ML models and show a reduction of 10%-70% of training time and 80% to 90% in data transfer to the comparable state-of-the-art approaches.

[1]  Marcel Keller,et al.  Overdrive: Making SPDZ Great Again , 2018, IACR Cryptol. ePrint Arch..

[2]  Ivan Damgård,et al.  Better Preprocessing for Secure Multiparty Computation , 2016, ACNS.

[3]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[4]  Jaideep Vaidya,et al.  Privacy-Preserving SVM Classification on Vertically Partitioned Data , 2006, PAKDD.

[5]  Heiko Ludwig,et al.  IBM Federated Learning: an Enterprise Framework White Paper V0.1 , 2020, ArXiv.

[6]  I. Damgård,et al.  A Generalisation, a Simplification and some Applications of Paillier’s Probabilistic Public-Key System , 2000 .

[7]  Jaideep Vaidya,et al.  A Survey of Privacy-Preserving Methods Across Vertically Partitioned Data , 2008, Privacy-Preserving Data Mining.

[8]  Xiao Jin,et al.  VAFL: a Method of Vertical Asynchronous Federated Learning , 2020, ArXiv.

[9]  Rafail Ostrovsky,et al.  Reusable Non-Interactive Secure Computation , 2019, IACR Cryptol. ePrint Arch..

[10]  Peter Christen,et al.  Data Matching , 2012, Data-Centric Systems and Applications.

[11]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[12]  Changyu Dong,et al.  When private set intersection meets big data: an efficient and scalable protocol , 2013, CCS.

[13]  Jonathan Katz,et al.  Global-Scale Secure Multiparty Computation , 2017, CCS.

[14]  David Pointcheval,et al.  Decentralized Multi-Client Functional Encryption for Inner Product , 2018, IACR Cryptol. ePrint Arch..

[15]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[16]  Vladimir Kolesnikov,et al.  Efficient Batched Oblivious PRF with Applications to Private Set Intersection , 2016, CCS.

[17]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[18]  Runhua Xu,et al.  HybridAlpha: An Efficient Approach for Privacy-Preserving Federated Learning , 2019, AISec@CCS.

[19]  Chris Clifton,et al.  Privacy-Preserving Decision Trees over Vertically Partitioned Data , 2005, DBSec.

[20]  Cheng Deng,et al.  Secure Bilevel Asynchronous Vertical Federated Learning with Backward Updating , 2021, AAAI.

[21]  Moti Yung,et al.  On Deploying Secure Computing Commercially: Private Intersection-Sum Protocols and their Business Applications , 2019, IACR Cryptol. ePrint Arch..

[22]  Marcel Keller,et al.  How to Choose Suitable Secure Multiparty Computation Using Generalized SPDZ , 2018, CCS.

[23]  Rainer Schnell,et al.  A Novel Error-Tolerant Anonymous Linking Code , 2011 .

[24]  Markulf Kohlweiss,et al.  Decentralizing Inner-Product Functional Encryption , 2019, IACR Cryptol. ePrint Arch..

[25]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[26]  Joachim M. Buhmann,et al.  Variational Federated Multi-Task Learning , 2019, ArXiv.

[27]  Reza Shokri,et al.  Machine Learning with Membership Privacy using Adversarial Regularization , 2018, CCS.

[28]  Mariana Raykova,et al.  Secure Linear Regression on Vertically Partitioned Datasets , 2016, IACR Cryptol. ePrint Arch..

[29]  Jonathan Katz,et al.  Faster Secure Two-Party Computation Using Garbled Circuits , 2011, USENIX Security Symposium.

[30]  Craig Metz,et al.  A One-Time Password System , 1996, RFC.

[31]  Xiang Li,et al.  Federated Doubly Stochastic Kernel Learning for Vertically Partitioned Data , 2020, KDD.

[32]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[33]  Rui Zhang,et al.  A Hybrid Approach to Privacy-Preserving Federated Learning , 2018, Informatik Spektrum.

[34]  Jean Dieudonne,et al.  Linear algebra and geometry , 1969 .

[35]  Ivan Damgård,et al.  Multiparty Computation from Somewhat Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[36]  Aleksandra B. Slavkovic,et al.  "Secure" Logistic Regression of Horizontally and Vertically Partitioned Distributed Databases , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[37]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[38]  Jonathan Katz,et al.  Authenticated Garbling and Efficient Maliciously Secure Two-Party Computation , 2017, CCS.

[39]  Benjamin Edwards,et al.  Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering , 2018, SafeAI@AAAI.

[40]  Tassilo Klein,et al.  Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.

[41]  Ramesh Raskar,et al.  Split learning for health: Distributed deep learning without sharing raw patient data , 2018, ArXiv.

[42]  Yuanming Shi,et al.  A Quasi-Newton Method Based Vertical Federated Learning Framework for Logistic Regression , 2019, ArXiv.

[43]  Reza Shokri,et al.  Comprehensive Privacy Analysis of Deep Learning: Stand-alone and Federated Learning under Passive and Active White-box Inference Attacks , 2018, ArXiv.

[44]  Mingkai Huang,et al.  Hybrid Differentially Private Federated Learning on Vertically Partitioned Data , 2020, ArXiv.

[45]  Richard Nock,et al.  Entity Resolution and Federated Learning get a Federated Resolution , 2018, ArXiv.

[46]  Brent Waters,et al.  Functional Encryption: Definitions and Challenges , 2011, TCC.

[47]  D. Shanks Class number, a theory of factorization, and genera , 1971 .

[48]  Vinod Vaikuntanathan,et al.  On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption , 2012, STOC '12.

[49]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[50]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[51]  Qiang Yang,et al.  SecureBoost: A Lossless Federated Learning Framework , 2019, IEEE Intelligent Systems.

[52]  Ramesh Raskar,et al.  Detailed comparison of communication efficiency of split learning and federated learning , 2019, ArXiv.

[53]  Dario Fiore,et al.  Multi-Input Functional Encryption for Inner Products: Function-Hiding Realizations and Constructions without Pairings , 2018, IACR Cryptol. ePrint Arch..

[54]  Angelo De Caro,et al.  Simple Functional Encryption Schemes for Inner Products , 2015, IACR Cryptol. ePrint Arch..

[55]  Richard Nock,et al.  Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption , 2017, ArXiv.

[56]  Allison Bishop,et al.  Fully Secure Functional Encryption: Attribute-Based Encryption and (Hierarchical) Inner Product Encryption , 2010, EUROCRYPT.