DroidLegacy: Automated Familial Classification of Android Malware

We present an automated method for extracting familial signatures for Android malware, i.e., signatures that identify malware produced by piggybacking potentially different benign applications with the same (or similar) malicious code. The APK classes that constitute malware code in a repackaged application are separated from the benign code and the Android API calls used by the malicious modules are extracted to create a signature. A piggybacked malicious app can be detected by first decomposing it into loosely coupled modules and then matching the Android API calls called by each of the modules against the signatures of the known malware families. Since the signatures are based on Android API calls, they are related to the core malware behavior, and thus are more resilient to obfuscations. In triage, AV companies need to automatically classify large number of samples so as to optimize assignment of human analysts. They need a system that gives low false negatives even if it is at the cost of higher false positives. Keeping this goal in mind, we fine tuned our system and used standard 10 fold cross validation over a dataset of 1,052 malicious APKs and 48 benign APKs to verify our algorithm. Results show that we have 94% accuracy, 97% precision, and 93% recall when separating benign from malware. We successfully classified our entire malware dataset into 11 families with 98% accuracy, 87% precision, and 94% recall.

[1]  Yanfang Ye,et al.  IMDS: intelligent malware detection system , 2007, KDD '07.

[2]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[3]  John C. S. Lui,et al.  Droid Analytics: A Signature Based Analytic System to Collect, Extract, Analyze and Associate Android Malware , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[4]  Elizabeth Burd,et al.  Evaluating clone detection tools for use during preventative maintenance , 2002, Proceedings. Second IEEE International Workshop on Source Code Analysis and Manipulation.

[5]  Hao Chen,et al.  Attack of the Clones: Detecting Cloned Applications on Android Markets , 2012, ESORICS.

[6]  Yuval Elovici,et al.  Google Android: A Comprehensive Security Assessment , 2010, IEEE Security & Privacy.

[7]  Andrew Walenstein,et al.  The Second International Workshop on Detection of Software Clones: workshop report , 2004, SOEN.

[8]  J. Mixter Fast , 2012 .

[9]  Nicolas Christin,et al.  All Your Droid Are Belong to Us: A Survey of Current Android Attacks , 2011, WOOT.

[10]  Barbara G. Ryder,et al.  User-Centric Dependence Analysis For Identifying Malicious Mobile Apps , 2012 .

[11]  Justin Zobel,et al.  Efficient plagiarism detection for large code repositories , 2007 .

[12]  Patrick Traynor,et al.  MAST: triage for market-scale mobile malware analysis , 2013, WiSec '13.

[13]  Hahn-Ming Lee,et al.  DroidMat: Android Malware Detection through Manifest and API Calls Tracing , 2012, 2012 Seventh Asia Joint Conference on Information Security.

[14]  Sahin Albayrak,et al.  Using static analysis for automatic assessment and mitigation of unwanted and malicious activities within Android applications , 2011, 2011 6th International Conference on Malicious and Unwanted Software.

[15]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[16]  Veelasha Moonsamy,et al.  Analysis of malicious and benign android applications , 2012, 2012 32nd International Conference on Distributed Computing Systems Workshops.

[17]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[18]  Anthony Desnos Android: From Reversing to Decompilation , 2011 .

[19]  Nicolas Anquetil,et al.  Experiments with clustering as a software remodularization method , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[20]  Fred Cohen,et al.  Computer viruses—theory and experiments , 1990 .

[21]  Yajin Zhou,et al.  Android Malware , 2013, SpringerBriefs in Computer Science.

[22]  Yajin Zhou,et al.  Fast, scalable detection of "Piggybacked" mobile applications , 2013, CODASPY.

[23]  Joris Kinable,et al.  Malware classification based on call graph clustering , 2010, Journal in Computer Virology.

[24]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[25]  Andrew Walenstein,et al.  The Software Similarity Problem in Malware Analysis , 2006, Duplication, Redundancy, and Similarity in Software.

[26]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[27]  P. Burman A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods , 1989 .

[28]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[29]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .