Frequent Subgraph Based Familial Classification of Android Malware

The rapid growth of Android malware poses great challenges to anti-malware systems because the sheer number of malware samples overwhelm malware analysis systems. A promising approach for speeding up malware analysis is to classify malware samples into families so that the common features in malwares belonging to the same family can be exploited for malware detection and inspection. However, the accuracy of existing classification solutions is limited because of two reasons. First, since the majority of Android malware is constructed by inserting malicious components into popular apps, the malware's legitimate part may misguide the classification algorithms. Second, the polymorphic variants of Android malware could evade the detection by employing transformation attacks. In this paper, we propose a novel approach that constructs frequent subgraph (fregraph) to represent the common behaviors of malwares in the same family for familial classification of Android malware. Moreover, we propose and develop FalDroid, an automatic system for classifying Android malware according to fregraph, and apply it to 6,565 malware samples from 30 families. The experimental results show that FalDroid can correctly classify 94.5% malwares into their families using around 4.4s per app.

[1]  King-Sun Fu,et al.  A distance measure between attributed relational graphs for pattern recognition , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Aristide Fattori,et al.  CopperDroid: Automatic Reconstruction of Android Malware Behaviors , 2015, NDSS.

[3]  Xuxian Jiang,et al.  Catch Me If You Can: Evaluating Android Anti-Malware Against Transformation Attacks , 2014, IEEE Transactions on Information Forensics and Security.

[4]  Juan E. Tapiador,et al.  Dendroid: A text mining approach to analyzing and classifying code structures in Android malware families , 2014, Expert Syst. Appl..

[5]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Kang G. Shin,et al.  Large-scale malware indexing using function-call graphs , 2009, CCS.

[8]  Kam-Fai Wong,et al.  Interpreting TF-IDF term weights as making relevance decisions , 2008, TOIS.

[9]  Yajin Zhou,et al.  Fast, scalable detection of "Piggybacked" mobile applications , 2013, CODASPY.

[10]  Joris Kinable,et al.  Malware classification based on call graph clustering , 2010, Journal in Computer Virology.

[11]  Dawn Xiaodong Song,et al.  Contextual Policy Enforcement in Android Applications with Permission Event Graphs , 2013, NDSS.

[12]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[13]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[14]  Qinghua Zheng,et al.  Exploring community structure of software Call Graph and its applications in class cohesion measurement , 2015, J. Syst. Softw..

[15]  Xiapu Luo,et al.  DexHunter: Toward Extracting Hidden Code from Packed Android Applications , 2015, ESORICS.

[16]  Arun Lakhotia,et al.  DroidLegacy: Automated Familial Classification of Android Malware , 2014, PPREW'14.

[17]  Peng Liu,et al.  Achieving accuracy and scalability simultaneously in detecting application clones on Android markets , 2014, ICSE.

[18]  Lei Zhang,et al.  Towards a scalable resource-driven approach for detecting repackaged Android applications , 2014, ACSAC.

[19]  Konrad Rieck,et al.  Structural detection of android malware using embedded call graphs , 2013, AISec.

[20]  Peng Wang,et al.  Finding Unknown Malice in 10 Seconds: Mass Vetting for New Threats at the Google-Play Scale , 2015, USENIX Security Symposium.

[21]  Yuan Zhang,et al.  Vetting undesirable behaviors in android apps with permission use analysis , 2013, CCS.

[22]  Qinghua Zheng,et al.  Software Plagiarism Detection with Birthmarks Based on Dynamic Key Instruction Sequences , 2015, IEEE Transactions on Software Engineering.

[23]  Mu Zhang,et al.  Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs , 2014, CCS.

[24]  Qinghua Zheng,et al.  Exploiting thread-related system calls for plagiarism detection of multithreaded programs , 2016, J. Syst. Softw..

[25]  Akiko Aizawa,et al.  An information-theoretic perspective of tf-idf measures , 2003, Inf. Process. Manag..

[26]  Sencun Zhu,et al.  ViewDroid: towards obfuscation-resilient mobile application repackaging detection , 2014, WiSec '14.

[27]  Giulio Concas,et al.  A study of the community structure of a complex software network , 2013, 2013 4th International Workshop on Emerging Trends in Software Metrics (WETSoM).

[28]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[30]  Isil Dillig,et al.  Apposcopy: semantics-based detection of Android malware through static analysis , 2014, SIGSOFT FSE.

[31]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[32]  Chao Yang,et al.  DroidMiner: Automated Mining and Characterization of Fine-grained Malicious Behaviors in Android Applications , 2014, ESORICS.

[33]  Hao Chen,et al.  AnDarwin: Scalable Detection of Semantically Similar Android Applications , 2013, ESORICS.

[34]  Somesh Jha,et al.  Synthesizing Near-Optimal Malware Specifications from Suspicious Behaviors , 2010, 2010 IEEE Symposium on Security and Privacy.

[35]  Hao Chen,et al.  Attack of the Clones: Detecting Cloned Applications on Android Markets , 2012, ESORICS.

[36]  Eric Bodden,et al.  A Machine-learning Approach for Classifying and Categorizing Android Sources and Sinks , 2014, NDSS.

[37]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .

[38]  Xin Sun,et al.  Detecting Code Reuse in Android Applications Using Component-Based Control Flow Graph , 2014, SEC.

[39]  Yuan-Cheng Lai,et al.  Identifying android malicious repackaged applications by thread-grained system call sequences , 2013, Comput. Secur..