On the impact of non-IID data on the performance and fairness of differentially private federated learning

Federated Learning enables distributed data holders to train a shared machine learning model on their collective data. It provides some measure of privacy by not requiring the data be pooled and centralized but still has been shown to be vulnerable to adversarial attacks. Differential Privacy provides rigorous guarantees and sufficient protection against adversarial attacks and has been widely employed in recent years to perform privacy preserving machine learning. One common trait in many of recent methods on federated learning and federated differentially private learning is the assumption of IID data, which in real world scenarios most certainly does not hold true. In this work, we empirically investigate the effect of non-IID data on node level on federated, differentially private, deep learning. We show the non-IID data to have a negative impact on both performance and fairness of the trained model and discuss the trade off between privacy, utility and fairness. Our results highlight the limits of common federated learning algorithms in a differentially private setting to provide robust, reliable results across underrepresented groups.

[1]  Eirini Ntoutsi,et al.  Parity-based cumulative fairness-aware boosting , 2022, Knowledge and Information Systems.

[2]  Sanjeev Arora,et al.  Evaluating Gradient Inversion Attacks and Defenses in Federated Learning , 2021, NeurIPS.

[3]  Yahya H. Ezzeldin,et al.  FairFed: Enabling Group Fairness in Federated Learning , 2021, AAAI.

[4]  Tai Le Quy,et al.  A survey on datasets for fairness‐aware machine learning , 2021, WIREs Data Mining Knowl. Discov..

[5]  Marzyeh Ghassemi,et al.  Chasing Your Long Tails: Differentially Private Prediction in Health Care Settings , 2020, FAccT.

[6]  Fatemehsadat Mireshghallah,et al.  Neither Private Nor Fair: Impact of Data Imbalance on Utility and Fairness in Differential Privacy , 2020, PPMLP@CCS.

[7]  Wenqi Wei,et al.  LDP-Fed: federated learning with local differential privacy , 2020, EdgeSys@EuroSys.

[8]  Michael Moeller,et al.  Inverting Gradients - How easy is it to break privacy in federated learning? , 2020, NeurIPS.

[9]  Eirini Ntoutsi,et al.  AdaFair: Cumulative Fairness Adaptive Boosting , 2019, CIKM.

[10]  Vitaly Shmatikov,et al.  Differential Privacy Has Disparate Impact on Model Accuracy , 2019, NeurIPS.

[11]  Yichen Wang,et al.  The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy , 2019, The Annals of Statistics.

[12]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[13]  Reza Shokri,et al.  Comprehensive Privacy Analysis of Deep Learning: Stand-alone and Federated Learning under Passive and Active White-box Inference Attacks , 2018, ArXiv.

[14]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[15]  C. Dwork,et al.  Exposed! A Survey of Attacks on Private Data , 2017, Annual Review of Statistics and Its Application.

[16]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[17]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[18]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[19]  A. Shorrocks,et al.  The Class of Additively Decomposable Inequality Measures , 1980 .