Mutual Information Regularization for Vertical Federated Learning

Vertical Federated Learning (VFL) is widely utilized in real-world applications to enable collaborative learning while protecting data privacy and safety. However, previous works show that parties without labels (passive parties) in VFL can infer the sensitive label information owned by the party with labels (active party) or execute backdoor attacks to VFL. Meanwhile, active party can also infer sensitive feature information from passive party. All these pose new privacy and security challenges to VFL systems. We propose a new general defense method which limits the mutual information between private raw data, including both features and labels, and intermediate outputs to achieve a better trade-off between model utility and privacy. We term this defense Mutual Information Regularization Defense (MID). We theoretically and experimentally testify the effectiveness of our MID method in defending existing attacks in VFL, including label inference attacks, backdoor attacks and feature reconstruction attacks.

[1]  Bo Li,et al.  Feature Reconstruction Attacks and Countermeasures of DNN training in Vertical Federated Learning , 2022, ArXiv.

[2]  D. Song,et al.  Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive Privacy Analysis and Beyond , 2022, ArXiv.

[3]  Jens Grossklags,et al.  SignDS-FL: Local Differentially Private Federated Learning with Sign-based Dimension Selection , 2022, ACM Trans. Intell. Syst. Technol..

[4]  Jens Grossklags,et al.  Comprehensive Analysis of Privacy Leakage in Vertical Federated Learning During Prediction , 2022, Proc. Priv. Enhancing Technol..

[5]  Pin-Yu Chen,et al.  CAFE: Catastrophic Data Leakage in Vertical Federated Learning , 2021, ArXiv.

[6]  Pavlo Molchanov,et al.  See through Gradients: Image Batch Recovery via GradInversion , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Virginia Smith,et al.  Label Leakage and Protection in Two-party Split Learning , 2021, ICLR.

[8]  Shangwei Guo,et al.  Privacy-preserving Collaborative Learning with Automatic Transformation Search , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Shouling Ji,et al.  Privacy Leakage of Real-World Vertical Federated Learning , 2020, ArXiv.

[10]  Tianjian Chen,et al.  Federated learning for privacy-preserving AI , 2020, Commun. ACM.

[11]  Beng Chin Ooi,et al.  Feature Inference Attack on Model Predictions in Vertical Federated Learning , 2020, 2021 IEEE 37th International Conference on Data Engineering (ICDE).

[12]  Yu Cheng,et al.  InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective , 2020, ICLR.

[13]  Ruoxi Jia,et al.  Improving Robustness to Model Inversion Attacks via Mutual Information Regularization , 2020, AAAI.

[14]  Murali Annavaram,et al.  Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge , 2020, NeurIPS.

[15]  Tianjian Chen,et al.  Backdoor attacks and defenses in feature-partitioned collaborative learning , 2020, ArXiv.

[16]  Bo Li,et al.  DBA: Distributed Backdoor Attacks against Federated Learning , 2020, ICLR.

[17]  Ismail Ben Ayed,et al.  A Unifying Mutual Information View of Metric Learning: Cross-Entropy vs. Pairwise Losses , 2020, ECCV.

[18]  Bo Zhao,et al.  iDLG: Improved Deep Leakage from Gradients , 2020, ArXiv.

[19]  Yang Liu,et al.  A Communication Efficient Collaborative Learning Framework for Distributed Features , 2019 .

[20]  Yang Liu,et al.  Federated Learning , 2019, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[21]  Di Niu,et al.  FDML: A Collaborative Machine Learning Framework for Distributed Features , 2019, KDD.

[22]  Song Han,et al.  Deep Leakage from Gradients , 2019, NeurIPS.

[23]  Qiang Yang,et al.  Federated Machine Learning , 2019, ACM Trans. Intell. Syst. Technol..

[24]  Qiang Yang,et al.  SecureBoost: A Lossless Federated Learning Framework , 2019, IEEE Intelligent Systems.

[25]  Yang Liu,et al.  Secure Federated Transfer Learning , 2018, ArXiv.

[26]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[27]  William J. Dally,et al.  Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.

[28]  Kenneth Heafield,et al.  Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.

[29]  Sam Ade Jacobs,et al.  Communication Quantization for Data-Parallel Training of Deep Neural Networks , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).

[30]  Blaise Agüera y Arcas,et al.  Federated Learning of Deep Networks using Model Averaging , 2016, ArXiv.

[31]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[32]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[33]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[34]  Renato Renner,et al.  An intuitive proof of the data processing inequality , 2011, Quantum Inf. Comput..

[35]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[36]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[37]  Yang Liu,et al.  Defending Batch-Level Label Inference and Replacement Attacks in Vertical Federated Learning , 2022, IEEE Transactions on Big Data.

[38]  Jingzheng Wu,et al.  Label Inference Attacks Against Vertical Federated Learning , 2022, USENIX Security Symposium.

[39]  Shuai Wang,et al.  Attacking Vertical Collaborative Learning System Using Adversarial Dominating Inputs , 2022, ArXiv.

[40]  J. Liu RVFR: Robust Vertical Federated Learning via Feature Subspace Recovery , 2021 .

[41]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[42]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .