Coriander: A Toolset for Generating Realistic Android Digital Evidence Datasets

Triage has been suggested as a means to prioritize and identify sources and artifacts of evidence that might be of most interest when faced with large amounts of digital evidence. Memory Forensics has long relied on simple string matching to triage evidence sources. In this paper, we describe the early developments into our study on Machine Learning-based triage for Memory Forensics. To start off, there are no large datasets of memory captures available. We thus, develop a toolset to enable the automated creation of realistic Android process memory dumps. Using our toolset we generate a dataset of 2375 process memory string dumps from both malicious and benign Android applications, classified by VirusTotal, and sourced from the AndroZoo project. Our dataset and toolset are made available online to help promote research in this field and related areas.

[1]  Jacques Klein,et al.  AndroZoo: Collecting Millions of Android Apps for the Research Community , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[2]  Michael Cohen,et al.  Scanning memory with Yara , 2017, Digit. Investig..

[3]  Jesse D. Kornblum Identifying almost identical files using context triggered piecewise hashing , 2006, Digit. Investig..

[4]  Gary C. Kessler,et al.  The growing need for on-scene triage of mobile devices , 2010, Digit. Investig..

[5]  Steve Mead,et al.  Unique file identification in the National Software Reference Library , 2006, Digit. Investig..

[6]  Marcus K. Rogers,et al.  Computer Forensics Field Triage Process Model , 2006, J. Digit. Forensics Secur. Law.

[7]  Fabio Marturana,et al.  A Machine Learning-based Triage methodology for automated categorization of digital media , 2013, Digit. Investig..

[8]  Aristide Fattori,et al.  CopperDroid: Automatic Reconstruction of Android Malware Behaviors , 2015, NDSS.

[9]  Yanick Fratantonio,et al.  ANDRUBIS -- 1,000,000 Apps Later: A View on Current Android Malware Behaviors , 2014, 2014 Third International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS).

[10]  Panagiotis Papapetrou,et al.  Harnessing Predictive Models for Assisting Network Forensic Investigations of DNS Tunnels , 2017 .

[11]  Vassil Roussev,et al.  Real-time digital forensics and triage , 2013, Digit. Investig..

[12]  Mourad Debbabi,et al.  Fingerprinting Android packaging: Generating DNAs for malware detection , 2016, Digit. Investig..

[13]  Sotiris Ioannidis,et al.  Rage against the virtual machine: hindering dynamic analysis of Android malware , 2014, EuroSec '14.

[14]  Joshua James,et al.  Automated network triage , 2013, Digit. Investig..

[15]  James S. Okolica,et al.  Whitelisting system state in windows forensic memory visualizations , 2017, Digit. Investig..

[16]  Vrizlynn L. L. Thing,et al.  Live memory forensics of mobile phones , 2010, Digit. Investig..

[17]  Ibrahim M. Baggili,et al.  File Detection On Network Traffic Using Approximate Matching , 2014, J. Digit. Forensics Secur. Law.