Enhancing State-of-the-art Classifiers with API Semantics to Detect Evolved Android Malware

Machine learning (ML) classifiers have been widely deployed to detect Android malware, but at the same time the application of ML classifiers also faces an emerging problem. The performance of such classifiers degrades---or called ages---significantly over time given the malware evolution. Prior works have proposed to use retraining or active learning to reverse and improve aged models. However, the underlying classifier itself is still blind, unaware of malware evolution. Unsurprisingly, such evolution-insensitive retraining or active learning comes at a price, i.e., the labeling of tens of thousands of malware samples and the cost of significant human efforts. In this paper, we propose the first framework, called APIGraph, to enhance state-of-the-art malware classifiers with the similarity information among evolved Android malware in terms of semantically-equivalent or similar API usages, thus naturally slowing down classifier aging. Our evaluation shows that because of the slow-down of classifier aging, APIGraph saves significant amounts of human efforts required by active learning in labeling new malware samples.

[1]  Minhui Xue,et al.  StormDroid: A Streaminglized Machine Learning-Based System for Detecting Android Malware , 2016, AsiaCCS.

[2]  Benjamin C. M. Fung,et al.  Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[3]  Mu Zhang,et al.  Things You May Not Know About Android (Un)Packers: A Systematic Study based on Whole-System Emulation , 2018, NDSS.

[4]  B D Satoto,et al.  Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster , 2018, IOP Conference Series: Materials Science and Engineering.

[5]  Mu Zhang,et al.  Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs , 2014, CCS.

[6]  Yang Liu,et al.  Adaptive and scalable Android malware detection through online learning , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[7]  Adam Doupé,et al.  Deep Android Malware Detection , 2017, CODASPY.

[8]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[9]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[10]  Lorin Wu,et al.  Android Spyware and Banking Trojan Distributed via DNS Spoofing , 2019 .

[11]  Lorenzo Cavallaro,et al.  TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time , 2018, USENIX Security Symposium.

[12]  Erik Derr,et al.  On Demystifying the Android Application Framework: Re-Visiting Android Permission Specification Analysis , 2016, USENIX Security Symposium.

[13]  Jacques Klein,et al.  DroidRA: taming reflection to support whole-program analysis of Android apps , 2016, ISSTA.

[14]  Qi Li,et al.  EveDroid: Event-Aware Android Malware Detection Against Model Degrading for IoT Devices , 2019, IEEE Internet of Things Journal.

[15]  Sankardas Roy,et al.  Deep Ground Truth Analysis of Current Android Malware , 2017, DIMVA.

[16]  Zhen Wang,et al.  Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[17]  Xiapu Luo,et al.  DexHunter: Toward Extracting Hidden Code from Packed Android Applications , 2015, ESORICS.

[18]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[19]  Silva Filho,et al.  Static analysis of implicit control flow: resolving Java reflection and Android intents , 2016 .

[20]  Yanfang Ye,et al.  Deep4MalDroid: A Deep Learning Framework for Android Malware Detection Based on Linux Kernel System Call Graphs , 2016, 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW).

[21]  Abdelouahid Derhab,et al.  MalDozer: Automatic framework for android malware detection using deep learning , 2018, Digit. Investig..

[22]  Heng Yin,et al.  DroidAPIMiner: Mining API-Level Features for Robust Malware Detection in Android , 2013, SecureComm.

[23]  Zhen Huang,et al.  PScout: analyzing the Android permission specification , 2012, CCS.

[24]  Machine Learning Methods for Malware Detection , 2019 .

[25]  Martin P. Robillard,et al.  Patterns of Knowledge in API Reference Documentation , 2013, IEEE Transactions on Software Engineering.

[26]  Gianluca Stringhini,et al.  MaMaDroid , 2019, ACM Trans. Priv. Secur..

[27]  Shouhuai Xu,et al.  DroidEye: Fortifying Security of Learning-Based Classifier Against Adversarial Android Malware Attacks , 2018, 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[28]  Gang Wang,et al.  Measuring and Modeling the Label Dynamics of Online Anti-Malware Engines , 2020, USENIX Security Symposium.

[29]  Jiamou Sun,et al.  Improving API Caveats Accessibility by Mining API Caveats Knowledge Graph , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[30]  Tao Xie,et al.  AppContext: Differentiating Malicious and Benign Mobile App Behaviors Using Context , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[31]  Christian Platzer,et al.  MARVIN: Efficient and Comprehensive Mobile App Classification through Static and Dynamic Analysis , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[32]  Patrick D. McDaniel,et al.  Adversarial Examples for Malware Detection , 2017, ESORICS.

[33]  Christopher Krügel,et al.  Grab 'n Run: Secure and Practical Dynamic Code Loading for Android Applications , 2015, ACSAC.

[34]  Ilia Nouretdinov,et al.  Transcend: Detecting Concept Drift in Malware Classification Models , 2017, USENIX Security Symposium.

[35]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[36]  Chao Yang,et al.  DroidMiner: Automated Mining and Characterization of Fine-grained Malicious Behaviors in Android Applications , 2014, ESORICS.

[37]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[38]  Christopher Krügel,et al.  Execute This! Analyzing Unsafe and Malicious Dynamic Code Loading in Android Applications , 2014, NDSS.

[39]  Ke Xu,et al.  DroidEvolver: Self-Evolving Android Malware Detection System , 2019, 2019 IEEE European Symposium on Security and Privacy (EuroS&P).

[40]  Haipeng Cai,et al.  Assessing and Improving Malware Detection Sustainability through App Evolution Studies , 2020, ACM Trans. Softw. Eng. Methodol..

[41]  Peng Wang,et al.  Finding Unknown Malice in 10 Seconds: Mass Vetting for New Threats at the Google-Play Scale , 2015, USENIX Security Symposium.

[42]  Jacques Klein,et al.  Euphony: Harmonious Unification of Cacophonous Anti-Virus Vendor Labels for Android Malware , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[43]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[44]  Trong Duc Nguyen,et al.  Exploring API Embedding for API Usages and Applications , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).