Android Malware Detection via Graphlet Sampling

Android systems are widely used in mobile & wireless distributed systems. In the near future, Android is believed to dominate the mobile distributed environment. However, with the popularity of Android-based smartphones/tablets comes the rampancy of Android-based malware. In this paper, we propose a novel topological signature of Android apps based on the function call graphs (FCGs) extracted from their Android App PacKages (APKs). Specifically, by leveraging recent advances on graphlet mining, the proposed method fully captures the invocator-invocatee relationship at local neighborhoods in an FCG without exponentially inflating the state space. Using real benign app and malware samples, we demonstrate that our method, App topologiCal signature through graphleT Sampling (ACTS), can detect malware and identify malware families robustly and efficiently. More importantly, we demonstrate that, without augmenting the FCG with any semantic features such as bytecode-based vertex typing, local topological information captured by ACTS alone can achieve a high malware detection accuracy. Since ACTS only uses structural features, which are orthogonal to semantic features, it is expected that combining them would give a greater improvement in malware detection accuracy than combining non-orthogonal semantic features.

[1]  Gianluca Stringhini,et al.  MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models (Extended Version) , 2016, NDSS 2017.

[2]  S. N. Dorogovtsev,et al.  Size-dependent degree distribution of a scale-free growing network. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Mu Zhang,et al.  Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs , 2014, CCS.

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  Yajin Zhou,et al.  Detecting repackaged smartphone applications in third-party android marketplaces , 2012, CODASPY '12.

[6]  Jason Nieh,et al.  A measurement study of google play , 2014, SIGMETRICS '14.

[7]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[8]  Konrad Rieck,et al.  Structural detection of android malware using embedded call graphs , 2013, AISec.

[9]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[10]  Martin G. Everett,et al.  A Graph-theoretic perspective on centrality , 2006, Soc. Networks.

[11]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[12]  Heng Yin,et al.  DroidAPIMiner: Mining API-Level Features for Robust Malware Detection in Android , 2013, SecureComm.

[13]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[14]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[15]  Mohammad Al Hasan,et al.  GUISE: a uniform sampler for constructing frequency histogram of graphlets , 2013, Knowledge and Information Systems.

[16]  Konrad Rieck,et al.  Modeling and Discovering Vulnerabilities with Code Property Graphs , 2014, 2014 IEEE Symposium on Security and Privacy.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Yanick Fratantonio,et al.  ANDRUBIS -- 1,000,000 Apps Later: A View on Current Android Malware Behaviors , 2014, 2014 Third International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS).

[19]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[20]  Edoardo M. Airoldi,et al.  Graphlet decomposition of a weighted network , 2012, AISTATS.

[21]  David J. Spiegelhalter,et al.  Introducing Markov chain Monte Carlo , 1995 .

[22]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[23]  Daniele Sgandurra,et al.  Classifying Android Malware through Subgraph Mining , 2013, DPM/SETOP.

[24]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[25]  Mohammad Al Hasan,et al.  GRAFT: an approximate graphlet counting algorithm for large graph analysis , 2012, CIKM.

[26]  Pedro Ribeiro,et al.  Efficient and Scalable Algorithms for Network Motifs Discovery , 2011 .

[27]  Xuxian Jiang,et al.  DroidChameleon: evaluating Android anti-malware against transformation attacks , 2013, ASIA CCS '13.

[28]  Ainuddin Wahid Abdul Wahab,et al.  A review on feature selection in mobile malware detection , 2015, Digit. Investig..

[29]  John C. S. Lui,et al.  Droid Analytics: A Signature Based Analytic System to Collect, Extract, Analyze and Associate Android Malware , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[30]  Xuxian Jiang,et al.  AppInk: watermarking android apps for repackaging deterrence , 2013, ASIA CCS '13.