FuncTracker: Discovering Shared Code to Aid Malware Forensics

Malware code has forensic value, as evident from recent studies drawing relationships between creators of Duqu and Stuxnet through similarity of their code. We present FuncTracker, a system developed on top of Palantir, to discover, visualize, and explore relationships between malware code, with the intent of drawing connections over very large corpi of malware – millions of binaries consiting of terrabytes of data. To address such scale we forego the classic data-mining methods requiring pairwise comparison of feature vectors, and instead represent a malware as a set of hashes over carefully selected features. To ensure that a hash match implies a strong match we represent individual functions using hashes of semantic features, in lieu of syntact features commonly used in the literature. A graph representing a collection of malware is formed by function hashes representing nodes, making it possible to explore the collection using classic graph operations supported by Palantir. By annotating the nodes with additional information, such as the location and time where the malware was discovered, one can use the relationship within malware to make connections between otherwise unrelated clues.

[1]  Arun Lakhotia,et al.  Fast location of similar code fragments using semantic 'juice' , 2013, PPREW '13.

[2]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[3]  Vassil Roussev,et al.  Data Fingerprinting with Similarity Digests , 2010, IFIP Int. Conf. Digital Forensics.

[4]  Andrew Walenstein,et al.  VILO: a rapid learning nearest-neighbor classifier for malware triage , 2013, Journal of Computer Virology and Hacking Techniques.

[5]  David Brumley,et al.  BitShred: feature hashing malware for scalable triage and semantic analysis , 2011, CCS '11.

[6]  Daniel J. Quinlan,et al.  Detecting code clones in binary executables , 2009, ISSTA.

[7]  Priya Narasimhan,et al.  Binary Function Clustering Using Semantic Hashes , 2012, 2012 11th International Conference on Machine Learning and Applications.

[8]  Ross J. Anderson,et al.  Rendezvous: A search engine for binary code , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[9]  Somesh Jha,et al.  Static Analysis of Executables to Detect Malicious Patterns , 2003, USENIX Security Symposium.

[10]  Christopher Krügel,et al.  Polymorphic Worm Detection Using Structural Information of Executables , 2005, RAID.

[11]  Debin Gao,et al.  BinHunt: Automatically Finding Semantic Differences in Binary Programs , 2008, ICICS.