Robust Federated Learning via Collaborative Machine Teaching

For federated learning systems deployed in the wild, data flaws hosted on local agents are widely witnessed. On one hand, given a large amount (e.g. over 60%) of training data are corrupted by systematic sensor noise and environmental perturbations, the performances of federated model training can be degraded significantly. On the other hand, it is prohibitively expensive for either clients or service providers to set up manual sanitary checks to verify the quality of data instances. In our study, we echo this challenge by proposing a collaborative and privacy-preserving machine teaching method. Specifically, we use a few trusted instances provided by teachers as benign examples in the teaching process. Our collaborative teaching approach seeks jointly the optimal tuning on the distributed training set, such that the model learned from the tuned training set predicts labels of the trusted items correctly. The proposed method couples the process of teaching and learning and thus produces directly a robust prediction model despite the extremely pervasive systematic data corruption. The experimental study on real benchmark data sets demonstrates the validity of our method.

[1]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[2]  Stephen J. Wright,et al.  Training Set Debugging Using Trusted Items , 2018, AAAI.

[3]  Michael I. Jordan,et al.  CoCoA: A General Framework for Communication-Efficient Distributed Optimization , 2016, J. Mach. Learn. Res..

[4]  Hans Ulrich Simon,et al.  Recursive teaching dimension, VC-dimension and sample compression , 2014, J. Mach. Learn. Res..

[5]  Xiaojin Zhu,et al.  Machine Teaching for Bayesian Learners in the Exponential Family , 2013, NIPS.

[6]  Dan Alistarh,et al.  Byzantine Stochastic Gradient Descent , 2018, NeurIPS.

[7]  Shie Mannor,et al.  Robust Logistic Regression and Classification , 2014, NIPS.

[8]  Xiaojin Zhu,et al.  The Teaching Dimension of Linear Learners , 2015, ICML.

[9]  Xiaojin Zhu,et al.  Teacher Improves Learning by Selecting a Training Subset , 2018, AISTATS.

[10]  Le Song,et al.  Iterative Machine Teaching , 2017, ICML.

[11]  Ezgi Karabulut,et al.  Distributed integer programming , 2017 .

[12]  Le Song,et al.  Towards Black-box Iterative Machine Teaching , 2017, ICML.

[13]  Ata Kabán,et al.  Label-Noise Robust Logistic Regression and Its Applications , 2012, ECML/PKDD.

[14]  Thomas Hofmann,et al.  Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.

[15]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[16]  Lili Su,et al.  Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent , 2019, PERV.

[17]  Kannan Ramchandran,et al.  Robust Federated Learning in a Heterogeneous Environment , 2019, ArXiv.

[18]  Indranil Gupta,et al.  Phocas: dimensional Byzantine-resilient stochastic gradient descent , 2018, ArXiv.

[19]  Prateek Mittal,et al.  Analyzing Federated Learning through an Adversarial Lens , 2018, ICML.

[20]  V. Yohai,et al.  A class of robust and fully efficient regression estimators , 2002 .

[21]  Michael Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[22]  Tobias Scheffer,et al.  Stackelberg games for adversarial prediction problems , 2011, KDD.

[23]  Indranil Gupta,et al.  Generalized Byzantine-tolerant SGD , 2018, ArXiv.

[24]  Qing Ling,et al.  RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets , 2018, AAAI.

[25]  Sebastian Tschiatschek,et al.  Teaching Inverse Reinforcement Learners via Features and Demonstrations , 2018, NeurIPS.

[26]  Sandra Zilles,et al.  Models of Cooperative Teaching and Learning , 2011, J. Mach. Learn. Res..

[27]  Lili Su,et al.  Securing Distributed Gradient Descent in High Dimensional Statistical Learning , 2018, Proc. ACM Meas. Anal. Comput. Syst..

[28]  Frank J. Balbach,et al.  Measuring teachability using variants of the teaching dimension , 2008, Theor. Comput. Sci..

[29]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[30]  Pietro Perona,et al.  Understanding the Role of Adaptivity in Machine Teaching: The Case of Version Space Learners , 2018, NeurIPS.

[31]  J. Stenton,et al.  Learning how to teach. , 1973, Nursing mirror and midwives journal.

[32]  Dimitris S. Papailiopoulos,et al.  DRACO: Byzantine-resilient Distributed Training via Redundant Gradients , 2018, ICML.

[33]  Tianjian Chen,et al.  Federated Machine Learning: Concept and Applications , 2019 .