On defending against label flipping attacks on malware detection systems

Label manipulation attacks are a subclass of data poisoning attacks in adversarial machine learning used against different applications, such as malware detection. These types of attacks represent a serious threat to detection systems in environments having high noise rate or uncertainty, such as complex networks and Internet of Thing (IoT). Recent work in the literature has suggested using the K -nearest neighboring algorithm to defend against such attacks. However, such an approach can suffer from low to miss-classification rate accuracy. In this paper, we design an architecture to tackle the Android malware detection problem in IoT systems. We develop an attack mechanism based on silhouette clustering method, modified for mobile Android platforms. We proposed two convolutional neural network-type deep learning algorithms against this Silhouette Clustering-based Label Flipping Attack . We show the effectiveness of these two defense algorithms— label-based semi-supervised defense and clustering-based semi-supervised defense —in correcting labels being attacked. We evaluate the performance of the proposed algorithms by varying the various machine learning parameters on three Android datasets: Drebin, Contagio, and Genome and three types of features: API, intent, and permission. Our evaluation shows that using random forest feature selection and varying ratios of features can result in an improvement of up to 19% accuracy when compared with the state-of-the-art method in the literature.

[1]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[2]  Mauro Conti,et al.  Can machine learning model with static features be fooled: an adversarial machine learning approach , 2020, Cluster Computing.

[3]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[4]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[5]  Dong Yang,et al.  3D Semi-Supervised Learning with Uncertainty-Aware Multi-View Co-Training , 2018, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[6]  Bhavani M. Thuraisingham,et al.  Adversarial support vector machine learning , 2012, KDD.

[7]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[8]  Zhi-Hua Zhou,et al.  Tri-net for Semi-Supervised Deep Learning , 2018, IJCAI.

[9]  Tudor Dumitras,et al.  Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks , 2018, NeurIPS.

[10]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[11]  Ata Kabán,et al.  Label-Noise Robust Logistic Regression and Its Applications , 2012, ECML/PKDD.

[12]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[13]  Lukasz Kaiser,et al.  Neural GPUs Learn Algorithms , 2015, ICLR.

[14]  Luis Muñoz-González,et al.  Label Sanitization against Label Flipping Poisoning Attacks , 2018, Nemesis/UrbReas/SoGood/IWAISe/GDM@PKDD/ECML.

[15]  Thomas W. Parks,et al.  Adaptive homogeneity-directed demosaicing algorithm , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[16]  Claudia Eckert,et al.  Is Feature Selection Secure against Training Data Poisoning? , 2015, ICML.

[17]  Lei Tian,et al.  A Clustering Algorithm Based on Joint Kernel Density for Millimeter Wave Radio Channels , 2019, 2019 13th European Conference on Antennas and Propagation (EuCAP).

[18]  Nicolas Papadakis,et al.  Beyond Supervised Classification: Extreme Minimal Supervision with the Graph 1-Laplacian , 2019, ArXiv.

[19]  Daniel Cullina,et al.  Enhancing robustness of machine learning systems via data transformations , 2017, 2018 52nd Annual Conference on Information Sciences and Systems (CISS).

[20]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[21]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[22]  Yannis Avrithis,et al.  Label Propagation for Deep Semi-Supervised Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Fabio Roli,et al.  Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization , 2017, AISec@CCS.

[24]  Patrick P. K. Chan,et al.  Adversarial Feature Selection Against Evasion Attacks , 2016, IEEE Transactions on Cybernetics.

[25]  Claudia Eckert,et al.  Support vector machines under adversarial label contamination , 2015, Neurocomputing.

[26]  Benoît Frénay,et al.  A comprehensive introduction to label noise , 2014, ESANN.

[27]  Zhi-Hua Zhou,et al.  Towards Making Unlabeled Data Never Hurt , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[29]  Ricky Laishram,et al.  Curie: A method for protecting SVM Classifier from Poisoning Attack , 2016, ArXiv.

[30]  Yiran Chen,et al.  Generative Poisoning Attack Method Against Neural Networks , 2017, ArXiv.

[31]  Prateek Mittal,et al.  Dimensionality Reduction as a Defense against Evasion Attacks on Machine Learning Classifiers , 2017, ArXiv.

[32]  Ata Kabán,et al.  Learning kernel logistic regression in the presence of class label noise , 2014, Pattern Recognition.

[33]  Jian Sun,et al.  Convolutional neural networks at constrained time cost , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Rui Zhang,et al.  Detecting Poisoning Attacks on Machine Learning in IoT Environments , 2018, 2018 IEEE International Congress on Internet of Things (ICIOT).

[35]  Yizhen Wang,et al.  Data Poisoning Attacks against Online Learning , 2018, ArXiv.

[36]  Jakramate Bootkrajang,et al.  A generalised label noise model for classification in the presence of annotation errors , 2016, Neurocomputing.