DroidEvolver: Self-Evolving Android Malware Detection System

Given the frequent changes in the Android framework and the continuous evolution of Android malware, it is challenging to detect malware over time in an effective and scalable manner. To address this challenge, we propose DroidEvolver, an Android malware detection system that can automatically and continually update itself during malware detection without any human involvement. While most existing malware detection systems can be updated by retraining on new applications with true labels, DroidEvolver requires neither retraining nor true labels to update itself, mainly due to the insight that DroidEvolver makes necessary and lightweight update using online learning techniques with evolving feature set and pseudo labels. The detection performance of DroidEvolver is evaluated on a dataset of 33,294 benign applications and 34,722 malicious applications developed over a period of six years. Using 6,286 applications dated in 2011 as the initial training set, DroidEvolver achieves high detection F-measure (95.27%), which only declines by 1.06% on average per year over the next five years for classifying 57,539 newly appeared applications. Note that such new applications could use new techniques and new APIs, which are not known to DroidEvolver when initialized with 2011 applications. Compared with the state-of-the-art overtime malware detection system MAMADROID, the F-measure of DroidEvolver is 2.19 times higher on average (10.21 times higher for the fifth year), and the efficiency of DroidEvolver is 28.58 times higher than MAMADROID during malware detection. DroidEvolver is also shown robust against typical code obfuscation techniques.

[1]  Gianluca Stringhini,et al.  MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models (Extended Version) , 2016, NDSS 2017.

[2]  Steven C. H. Hoi,et al.  Online Learning: A Comprehensive Survey , 2018, Neurocomputing.

[3]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[4]  Minhui Xue,et al.  StormDroid: A Streaminglized Machine Learning-Based System for Detecting Android Malware , 2016, AsiaCCS.

[5]  Christian Platzer,et al.  MARVIN: Efficient and Comprehensive Mobile App Classification through Static and Dynamic Analysis , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[6]  Jacques Klein,et al.  AndroZoo: Collecting Millions of Android Apps for the Research Community , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[7]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[8]  Yang Liu,et al.  Context-Aware, Adaptive, and Scalable Android Malware Detection Through Online Learning , 2017, IEEE Transactions on Emerging Topics in Computational Intelligence.

[9]  Yanfang Ye,et al.  Deep4MalDroid: A Deep Learning Framework for Android Malware Detection Based on Linux Kernel System Call Graphs , 2016, 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW).

[10]  Patrick D. McDaniel,et al.  Adversarial Perturbations Against Deep Neural Networks for Malware Classification , 2016, ArXiv.

[11]  Angelos Stavrou,et al.  When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors , 2016, NDSS.

[12]  Marius Kloft,et al.  Security analysis of online centroid anomaly detection , 2010, J. Mach. Learn. Res..

[13]  Juha Karhunen,et al.  A pragmatic android malware detection procedure , 2017, Comput. Secur..

[14]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[15]  Jacques Klein,et al.  FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps , 2014, PLDI.

[16]  Matthew Smith,et al.  SoK: Lessons Learned from Android Security Research for Appified Software Platforms , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[17]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[18]  Adam Doupé,et al.  Deep Android Malware Detection , 2017, CODASPY.

[19]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[20]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[21]  Patrick D. McDaniel,et al.  On lightweight mobile phone application certification , 2009, CCS.

[22]  Yang Liu,et al.  Adaptive and scalable Android malware detection through online learning , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[23]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[24]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[25]  Ilia Nouretdinov,et al.  Transcend: Detecting Concept Drift in Malware Classification Models , 2017, USENIX Security Symposium.

[26]  Steven C. H. Hoi,et al.  LIBOL: a library for online learning algorithms , 2014, J. Mach. Learn. Res..