A Privacy-Preserving Federated Learning System for Android Malware Detection Based on Edge Computing

This paper presents a privacy-preserving federated learning (PPFL) system for the detection of android malware. The proposed PPFL allows mobile devices to collaborate together for training a classifier without exposing the sensitive information, such as the application programming interface (API) calls and permission configuration, and the learned local model by each mobile device. This work implements the privacy-preserving federated learning system based on support vector machine (SVM) and secure multi-party computation techniques. It also demonstrates the feasibility using the Android malware dataset by National Institute of Information and Communication Technology (NICT), Japan. The presented experiments evaluate the performance of the trained classifier by the proposed PPFL system. The evaluation also compares the performance of the classifier of PPFL and that of centralized training system for the use cases of i) different data set and ii) different features on distinct mobile device. The results show that the performance of the PPFL classifier outperforms that of centralized training system. Moreover, the privacy of app information (i.e., API and permission information) and trained local models is guaranteed. To the best of our knowledge, this work is the first Android malware detection system based on privacy-preserving federated learning system.

[1]  Sameer Wagh,et al.  SecureNN: 3-Party Secure Computation for Neural Network Training , 2019, Proc. Priv. Enhancing Technol..

[2]  Hubert Eichner,et al.  APPLIED FEDERATED LEARNING: IMPROVING GOOGLE KEYBOARD QUERY SUGGESTIONS , 2018, ArXiv.

[3]  Heng Yin,et al.  DroidScope: Seamlessly Reconstructing the OS and Dalvik Semantic Views for Dynamic Android Malware Analysis , 2012, USENIX Security Symposium.

[4]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[5]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[6]  Ivan Beschastnikh,et al.  Mitigating Sybils in Federated Learning Poisoning , 2018, ArXiv.

[7]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[8]  Alessandra Gorla,et al.  Checking app behavior against app descriptions , 2014, ICSE.

[9]  Kan Yang,et al.  VerifyNet: Secure and Verifiable Federated Learning , 2020, IEEE Transactions on Information Forensics and Security.

[10]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[11]  Shiho Moriai,et al.  Privacy-Preserving Deep Learning via Additively Homomorphic Encryption , 2018, IEEE Transactions on Information Forensics and Security.

[12]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[13]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[14]  Tom Ouyang,et al.  Federated Learning Of Out-Of-Vocabulary Words , 2019, ArXiv.

[15]  Prateek Mittal,et al.  Analyzing Federated Learning through an Adversarial Lens , 2018, ICML.

[16]  Xingquan Zhu,et al.  Machine Learning for Android Malware Detection Using Permission and API Calls , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[17]  Gaurav Kapoor,et al.  Protection Against Reconstruction and Its Applications in Private Federated Learning , 2018, ArXiv.

[18]  Giuseppe Ateniese,et al.  Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning , 2017, CCS.

[19]  Rui Zhang,et al.  A Hybrid Approach to Privacy-Preserving Federated Learning , 2018, Informatik Spektrum.

[20]  Bo Sun,et al.  A scalable and accurate feature representation method for identifying malicious mobile applications , 2019, SAC.

[21]  Ivan Damgård,et al.  Multiparty Computation from Somewhat Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[22]  Tassilo Klein,et al.  Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.

[23]  Chun-Ying Huang,et al.  Performance Evaluation on Permission-Based Detection for Android Malware , 2013 .

[24]  Yang Song,et al.  Beyond Inferring Class Representatives: User-Level Privacy Leakage From Federated Learning , 2018, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[25]  Donald Beaver,et al.  Efficient Multiparty Protocols Using Circuit Randomization , 1991, CRYPTO.

[26]  Samuel Marchal,et al.  DÏoT: A Federated Self-learning Anomaly Detection System for IoT , 2018, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[27]  Heng Yin,et al.  DroidAPIMiner: Mining API-Level Features for Robust Malware Detection in Android , 2013, SecureComm.

[28]  Shanqing Guo,et al.  Integration of Multi-modal Features for Android Malware Detection Using Linear SVM , 2016, 2016 11th Asia Joint Conference on Information Security (AsiaJCIS).

[29]  Sattar Hashemi,et al.  Malware detection based on mining API calls , 2010, SAC '10.

[30]  Tao Ban,et al.  Android Application Analysis Using Machine Learning Techniques , 2018, AI in Cybersecurity.

[31]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[32]  Mu Zhang,et al.  Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs , 2014, CCS.

[33]  Mauro Conti,et al.  Detecting Android Malware Leveraging Text Semantics of Network Flows , 2017, IEEE Transactions on Information Forensics and Security.

[34]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[35]  Adi Shamir,et al.  How to share a secret , 1979, CACM.