Auror: defending against poisoning attacks in collaborative deep learning systems

Deep learning in a collaborative setting is emerging as a corner-stone of many upcoming applications, wherein untrusted users collaborate to generate more accurate models. From the security perspective, this opens collaborative deep learning to poisoning attacks, wherein adversarial users deliberately alter their inputs to mis-train the model. These attacks are known for machine learning systems in general, but their impact on new deep learning systems is not well-established. We investigate the setting of indirect collaborative deep learning --- a form of practical deep learning wherein users submit masked features rather than direct data. Indirect collaborative deep learning is preferred over direct, because it distributes the cost of computation and can be made privacy-preserving. In this paper, we study the susceptibility of collaborative deep learning systems to adversarial poisoning attacks. Specifically, we obtain the following empirical results on 2 popular datasets for handwritten images (MNIST) and traffic signs (GTSRB) used in auto-driving cars. For collaborative deep learning systems, we demonstrate that the attacks have 99% success rate for misclassifying specific target data while poisoning only 10% of the entire training dataset. As a defense, we propose Auror, a system that detects malicious users and generates an accurate model. The accuracy under the deployed defense on practical datasets is nearly unchanged when operating in the absence of attacks. The accuracy of a model trained using Auror drops by only 3% even when 30% of all the users are adversarial. Auror provides a strong guarantee against evasion; if the attacker tries to evade, its attack effectiveness is bounded.

[1]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[2]  Johannes Stallkamp,et al.  Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[3]  Emiliano De Cristofaro,et al.  Efficient Private Statistics with Succinct Sketches , 2015, NDSS.

[4]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[5]  Yuguang Fang,et al.  A Secure Collaborative Machine Learning Framework Based on Data Locality , 2014, 2015 IEEE Global Communications Conference (GLOBECOM).

[6]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[7]  Ananthram Swami,et al.  Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples , 2016, ArXiv.

[8]  Bhiksha Raj,et al.  Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers , 2010, NIPS.

[9]  Junfeng Yang,et al.  Towards Making Systems Forget with Machine Unlearning , 2015, 2015 IEEE Symposium on Security and Privacy.

[10]  Blaine Nelson,et al.  Misleading Learners: Co-opting Your Spam Filter , 2009 .

[11]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[12]  Blaine Nelson,et al.  Can machine learning be secure? , 2006, ASIACCS '06.

[13]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[14]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[15]  Yuancheng Li,et al.  A Hybrid Malicious Code Detection Method based on Deep Learning , 2015 .

[16]  Yuguang Fang,et al.  Privacy-Preserving Machine Learning Algorithms for Big Data Systems , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[17]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[18]  Sébastien Marcel,et al.  Counter-measures to photo attacks in face recognition: A public database and a baseline , 2011, 2011 International Joint Conference on Biometrics (IJCB).

[19]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[20]  Yanjun Qi,et al.  Automatically Evading Classifiers: A Case Study on PDF Malware Classifiers , 2016, NDSS.

[21]  Susmita Sur-Kolay,et al.  Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare , 2015, IEEE Journal of Biomedical and Health Informatics.

[22]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[23]  Fabrice Muhlenbach,et al.  Identifying and Handling Mislabelled Instances , 2004, Journal of Intelligent Information Systems.

[24]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[25]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[26]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[27]  Ling Huang,et al.  ANTIDOTE: understanding and defending against poisoning of anomaly detectors , 2009, IMC '09.

[28]  Christopher Meek,et al.  Adversarial learning , 2005, KDD '05.

[29]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[30]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[31]  Dit-Yan Yeung,et al.  Collaborative Deep Learning for Recommender Systems , 2014, KDD.

[32]  Fabio Roli,et al.  Bagging Classifiers for Fighting Poisoning Attacks in Adversarial Classification Tasks , 2011, MCS.

[33]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[34]  Gang Wang,et al.  Man vs. Machine: Practical Adversarial Detection of Malicious Crowdsourcing Workers , 2014, USENIX Security Symposium.