Distributed Detection of Malicious Android Apps While Preserving Privacy Using Federated Learning

Recently, deep learning has been widely used to solve existing computing problems through large-scale data mining. Conventional training of the deep learning model is performed on a central (cloud) server that is equipped with high computing power, by integrating data via high computational intensity. However, integrating raw data from multiple clients raises privacy concerns that are increasingly being focused on. In federated learning (FL), clients train deep learning models in a distributed fashion using their local data; instead of sending raw data to a central server, they send parameter values of the trained local model to a central server for integration. Because FL does not transmit raw data to the outside, it is free from privacy issues. In this paper, we perform an experimental study that explores the dynamics of the FL-based Android malicious app detection method under three data distributions across clients, i.e., (i) independent and identically distributed (IID), (ii) non-IID, (iii) non-IID and unbalanced. Our experiments demonstrate that the application of FL is feasible and efficient in detecting malicious Android apps in a distributed manner on cellular networks.

[1]  Mingu Kang,et al.  Resilience against Adversarial Examples: Data-Augmentation Exploiting Generative Adversarial Networks , 2021, KSII Trans. Internet Inf. Syst..

[2]  Roberto Iglesias,et al.  Non-IID data and Continual Learning processes in Federated Learning: A long road ahead , 2021, Inf. Fusion.

[3]  Chen Wang,et al.  Safeguarding cross-silo federated learning with local differential privacy , 2021, Digit. Commun. Networks.

[4]  Ali Dehghantanha,et al.  A survey on security and privacy of federated learning , 2021, Future Gener. Comput. Syst..

[5]  Angelos Amditis,et al.  Federated vs. Centralized Machine Learning under Privacy-elastic Users: A Comparative Analysis , 2020, 2020 IEEE 19th International Symposium on Network Computing and Applications (NCA).

[6]  Dima Alhadidi,et al.  Dynamic Android Malware Category Classification using Semi-Supervised Deep Learning , 2020, 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech).

[7]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[8]  Sunav Choudhary,et al.  Federated Learning with Personalization Layers , 2019, ArXiv.

[9]  Xiang Li,et al.  On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.

[10]  Mauro Conti,et al.  A machine learning based approach to detect malicious android apps using discriminant system calls , 2019, Future Gener. Comput. Syst..

[11]  Wouter Joosen,et al.  Chained Anomaly Detection Models for Federated Learning: An Intrusion Detection Case Study , 2018, Applied Sciences.

[12]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[13]  Sungho Kim,et al.  LARGen: Automatic Signature Generation for Malwares Using Latent Dirichlet Allocation , 2018, IEEE Transactions on Dependable and Secure Computing.

[14]  Rui Zhang,et al.  Malware identification using visualization images and deep learning , 2018, Comput. Secur..

[15]  Zheng Wang,et al.  Deep Learning-Based Intrusion Detection With Adversaries , 2018, IEEE Access.

[16]  Gwoboa Horng,et al.  Adversarial Attacks on SDN-Based Deep Learning IDS System , 2018, Lecture Notes in Electrical Engineering.

[17]  Dimitris S. Papailiopoulos,et al.  ATOMO: Communication-efficient Learning via Atomic Sparsification , 2018, NeurIPS.

[18]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[19]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[20]  Samy Bengio,et al.  Revisiting Distributed Synchronous SGD , 2016, ArXiv.

[21]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[22]  Qi Li,et al.  Android Malware Detection Based on Static Analysis of Characteristic Tree , 2015, 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.

[23]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[24]  Tatsuya Mori,et al.  Discovering similar malware samples using API call topics , 2015, 2015 12th Annual IEEE Consumer Communications and Networking Conference (CCNC).

[25]  Eul Gyu Im,et al.  Malware analysis using visualized images and entropy graphs , 2014, International Journal of Information Security.

[26]  Guanhua Yan,et al.  Discriminant malware distance learning on structural information for automated malware classification , 2013 .

[27]  Igor Santos,et al.  Opcode sequences as representation of executables for data-mining-based unknown malware detection , 2013, Inf. Sci..

[28]  Curtis B. Storlie,et al.  Graph-based malware detection using dynamic analysis , 2011, Journal in Computer Virology.

[29]  B. S. Manjunath,et al.  Malware images: visualization and automatic classification , 2011, VizSec '11.

[30]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[31]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[32]  James F Kilroy,et al.  The Threat Of Evolution , 2007 .