Deep Ground Truth Analysis of Current Android Malware

To build effective malware analysis techniques and to evaluate new detection tools, up-to-date datasets reflecting the current Android malware landscape are essential. For such datasets to be maximally useful, they need to contain reliable and complete information on malware’s behaviors and techniques used in the malicious activities. Such a dataset shall also provide a comprehensive coverage of a large number of types of malware. The Android Malware Genome created circa 2011 has been the only well-labeled and widely studied dataset the research community had easy access to (As of 12/21/2015 the Genome authors have stopped supporting the dataset sharing due to resource limitation). But not only is it outdated and no longer represents the current Android malware landscape, it also does not provide as detailed information on malware’s behaviors as needed for research. Thus it is urgent to create a high-quality dataset for Android malware. While existing information sources such as VirusTotal are useful, to obtain the accurate and detailed information for malware behaviors, deep manual analysis is indispensable. In this work we present our approach to preparing a large Android malware dataset for the research community. We leverage existing anti-virus scan results and automation techniques in categorizing our large dataset (containing 24,650 malware app samples) into 135 varieties (based on malware behavioral semantics) which belong to 71 malware families. For each variety, we select three samples as representatives, for a total of 405 malware samples, to conduct in-depth manual analysis. Based on the manual analysis result we generate detailed descriptions of each malware variety’s behaviors and include them in our dataset. We also report our observations on the current landscape of Android malware as depicted in the dataset. Furthermore, we present detailed documentation of the process used in creating the dataset, including the guidelines for the manual analysis. We make our Android malware dataset available to the research community.

[1]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[2]  Peng Wang,et al.  Finding Unknown Malice in 10 Seconds: Mass Vetting for New Threats at the Google-Play Scale , 2015, USENIX Security Symposium.

[3]  Christopher Krügel,et al.  What the App is That? Deception and Countermeasures in the Android User Interface , 2015, 2015 IEEE Symposium on Security and Privacy.

[4]  Xiang Pan,et al.  Are these Ads Safe: Detecting Hidden Attacks through the Mobile App-Web Interfaces , 2016, NDSS.

[5]  Sankardas Roy,et al.  Amandroid: A Precise and General Inter-component Data Flow Analysis Framework for Security Vetting of Android Apps , 2014, CCS.

[6]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[7]  Juan Caballero,et al.  AVclass: A Tool for Massive Malware Labeling , 2016, RAID.

[8]  Jacques Klein,et al.  On the Lack of Consensus in Anti-Virus Decisions: Metrics and Insights on Building Ground Truths of Android Malware , 2016, DIMVA.

[9]  Aziz Mohaisen,et al.  AV-Meter: An Evaluation of Antivirus Scans and Labels , 2014, DIMVA.

[10]  Stefano Zanero,et al.  Finding Non-trivial Malware Naming Inconsistencies , 2011, ICISS.

[11]  Jiyong Jang,et al.  Android Malware Clustering through Malicious Payload Mining , 2017, RAID.

[12]  Eric Bodden,et al.  Harvesting Runtime Values in Android Applications That Feature Anti-Analysis Techniques , 2016, NDSS.

[13]  Yanick Fratantonio,et al.  ANDRUBIS -- 1,000,000 Apps Later: A View on Current Android Malware Behaviors , 2014, 2014 Third International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS).

[14]  Jacques Klein,et al.  AndroZoo: Collecting Millions of Android Apps for the Research Community , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[15]  Aristide Fattori,et al.  CopperDroid: Automatic Reconstruction of Android Malware Behaviors , 2015, NDSS.