Securing Federated Sensitive Topic Classification against Poisoning Attacks

We present a Federated Learning (FL) based solution for building a distributed classifier capable of detecting URLs containing GDPR-sensitive content related to categories such as health, sexual preference, political beliefs, etc. Although such a classifier addresses the limitations of previous offline/centralised classifiers,it is still vulnerable to poisoning attacks from malicious users that may attempt to reduce the accuracy for benign users by disseminating faulty model updates. To guard against this, we develop a robust aggregation scheme based on subjective logic and residual-based attack detection. Employing a combination of theoretical analysis, trace-driven simulation, as well as experimental validation with a prototype and real users, we show that our classifier can detect sensitive content with high accuracy, learn new labels fast, and remain robust in view of poisoning attacks from malicious users, as well as imperfect input from non-malicious ones.

[1]  Zheli Liu,et al.  Secure Aggregation is Insecure: Category Inference Attack on Federated Learning , 2023, IEEE Transactions on Dependable and Secure Computing.

[2]  Ahmad-Reza Sadeghi,et al.  DeepSight: Mitigating Backdoor Attacks in Federated Learning Through Deep Model Inspection , 2022, NDSS.

[3]  Bill Yuchen Lin,et al.  FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing Tasks , 2021, NAACL-HLT.

[4]  Tianbao Yang,et al.  Federated Deep AUC Maximization for Heterogeneous Data with a Constant Communication Complexity , 2021, ICML.

[5]  Priyanka Mary Mammen,et al.  Federated Learning: Opportunities and Challenges , 2021, ArXiv.

[6]  Xiaoyu Cao,et al.  FLTrust: Byzantine-robust Federated Learning via Trust Bootstrapping , 2020, NDSS.

[7]  Martin Jaggi,et al.  Learning from History for Byzantine Robust Optimization , 2020, ICML.

[8]  Diego Perino,et al.  FLaaS: Federated Learning as a Service , 2020, DistributedML@CoNEXT.

[9]  Georgios Smaragdakis,et al.  Identifying Sensitive URLs at Web-Scale , 2020, Internet Measurement Conference.

[10]  Emiliano De Cristofaro,et al.  Local and Central Differential Privacy for Robustness and Privacy in Federated Learning , 2020, NDSS.

[11]  Ben Y. Zhao,et al.  Backdoor Attacks Against Deep Learning Systems in the Physical World , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Alina Oprea,et al.  Subpopulation Data Poisoning Attacks , 2020, CCS.

[13]  Srinivasan Seshan,et al.  Learning Context-Aware Policies from Multiple Smart Homes via Federated Multi-Task Learning , 2020, 2020 IEEE/ACM Fifth International Conference on Internet-of-Things Design and Implementation (IoTDI).

[14]  Proceedings of the 15th International Conference on emerging Networking EXperiments and Technologies , 2019, CoNEXT Companion.

[15]  Jinyuan Jia,et al.  Local Model Poisoning Attacks to Byzantine-Robust Federated Learning , 2019, USENIX Security Symposium.

[16]  Ananda Theertha Suresh,et al.  Can You Really Backdoor Federated Learning? , 2019, ArXiv.

[17]  Bo Li,et al.  Attack-Resistant Federated Learning with Residual-based Reweighting , 2019, ArXiv.

[18]  Shengli Xie,et al.  Incentive Mechanism for Reliable Federated Learning: A Joint Optimization Approach to Combining Reputation and Contract Theory , 2019, IEEE Internet of Things Journal.

[19]  Claudio Soriente,et al.  Beyond content analysis: detecting targeted ads via distributed counting , 2019, CoNEXT.

[20]  Li Huang,et al.  Patient Clustering Improves Efficiency of Federated Machine Learning to predict mortality and hospital stay time using distributed Electronic Medical Records , 2019, J. Biomed. Informatics.

[21]  David Dagan Feng,et al.  Unsupervised Deep Transfer Feature Learning for Medical Image Classification , 2019, 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019).

[22]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[23]  Prateek Mittal,et al.  Analyzing Federated Learning through an Adversarial Lens , 2018, ICML.

[24]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[25]  Ivan Beschastnikh,et al.  Mitigating Sybils in Federated Learning Poisoning , 2018, ArXiv.

[26]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[27]  Indranil Gupta,et al.  Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance , 2018, ICML.

[28]  Vitaly Shmatikov,et al.  Exploiting Unintended Feature Leakage in Collaborative Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[29]  J. F. A. Murphy The General Data Protection Regulation (GDPR) , 2018, Irish medical journal.

[30]  Fei Wang,et al.  Privacy-Preserving Patient Similarity Learning in a Federated Environment: Development and Analysis , 2018, JMIR medical informatics.

[31]  Wei Shi,et al.  Federated learning of predictive models from federated Electronic Health Records , 2018, Int. J. Medical Informatics.

[32]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[33]  Rachid Guerraoui,et al.  The Hidden Vulnerability of Distributed Learning in Byzantium , 2018, ICML.

[34]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[35]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[36]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[37]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[38]  Claudio Soriente,et al.  Who is Fiddling with Prices?: Building and Deploying a Watchdog Service for E-commerce , 2017, SIGCOMM.

[39]  Audun Jøsang,et al.  Subjective Logic , 2016, Artificial Intelligence: Foundations, Theory, and Algorithms.

[40]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[41]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Elisa Bertino,et al.  Differentially Private K-Means Clustering , 2015, CODASPY.

[44]  H. Elsheshtawy,et al.  Personal Information Protection and Electronic Documents Act , 2015 .

[45]  Sébastien Bubeck Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[46]  Stanislav Minsker Geometric median and robust estimation in Banach spaces , 2013, 1308.1334.

[47]  Blaine Nelson,et al.  The security of machine learning , 2010, Machine Learning.

[48]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Audun Jøsang,et al.  Trust network analysis with subjective logic , 2006, ACSC.

[50]  Julia Kastner,et al.  Introduction to Robust Estimation and Hypothesis Testing , 2005 .

[51]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[52]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[53]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[54]  A. Siegel Robust regression using repeated medians , 1982 .

[55]  Amir Houmansadr,et al.  Manipulating the Byzantine: Optimizing Model Poisoning Attacks and Defenses for Federated Learning , 2021, NDSS.

[56]  Ivan Beschastnikh,et al.  The Limitations of Federated Learning in Sybil Settings , 2020, RAID.

[57]  Yong Li,et al.  PMF: A Privacy-preserving Human Mobility Prediction Framework via Federated Learning , 2020 .

[58]  T. Minka Estimating a Dirichlet distribution , 2012 .

[59]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[60]  European Commission. , 2001 .

[61]  P. Rousseeuw,et al.  Breakdown Points of Affine Equivariant Estimators of Multivariate Location and Covariance Matrices , 1991 .

[62]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .