Unity in Diversity: Phylogenetic-inspired Techniques for Reverse Engineering and Detection of Malware Families

We developed a framework for abstracting, aligning and analysing malware execution traces and performed a preliminary exploration of state of the art phylogenetic methods, whose strengths lie in pattern recognition and visualisation, to derive the statistical relationships within two contemporary malware families. We made use of phylogenetic trees and networks, motifs, logos, composition biases, and tree topology comparison methods with the objective of identifying common functionality and studying sources of variation in related samples. Networks were more useful for visualising short nop-equivalent code metamorphism than trees, tree topology comparison was suited for studying variations in multiple sets of homologous procedures. We found logos could be used for code normalisation, which resulted in 33% to 62% reduction in the number of instructions. A motif search showed that API sequences related to the management of memory, I/O, libraries and threading do not change significantly amongst malware variants, composition bias provided an efficient way to distinguish between families. Using context-sensitive procedure analysis, we found that 100% of a set of memory management procedures used by the FakeAV-DO and "Skyhoo" malware families were uniquely identifiable. We discuss how phylogenetic techniques can aid the reverse engineering and detection of malware families and describe some related challenges.

[1]  W. Gilks,et al.  A novel algorithm and web-based tool for comparing two alternative phylogenetic trees , 2006, Bioinform..

[2]  Mark Adler,et al.  zlib compression library , 2012 .

[3]  James Newsom,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software, Network and Distributed System Security Symposium Conference Proceedings : 2005 , 2005 .

[4]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[5]  Heng Yin,et al.  Dynamic Spyware Analysis , 2007, USENIX Annual Technical Conference.

[6]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[7]  Mark David Weiser,et al.  Program slices: formal, psychological, and practical investigations of an automatic program abstraction method , 1979 .

[8]  Christopher Krügel,et al.  Inspector Gadget: Automated Extraction of Proprietary Gadgets from Malware Binaries , 2010, 2010 IEEE Symposium on Security and Privacy.

[9]  Cynthia A. Phillips,et al.  Constructing Computer Virus Phylogenies , 1996, CPM.

[10]  Heng Yin,et al.  Panorama: capturing system-wide information flow for malware detection and analysis , 2007, CCS '07.

[11]  Andrew Walenstein,et al.  Malware phylogeny generation using permutations of code , 2005, Journal in Computer Virology.

[12]  Hassen Saïdi,et al.  A Foray into Conficker's Logic and Rendezvous Points , 2009, LEET.

[13]  Nello Cristianini,et al.  Modeling sequence evolution with kernel methods , 2007, Comput. Optim. Appl..

[14]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[15]  Rainer Fuchs,et al.  CLUSTAL V: improved software for multiple sequence alignment , 1992, Comput. Appl. Biosci..

[16]  Stephanie Wehner,et al.  Analyzing worms and network traffic using compression , 2005, J. Comput. Secur..

[17]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[18]  Pietro Liò,et al.  Modeling sequence evolution. , 2008, Methods in molecular biology.

[19]  Enrique V. Carrera,et al.  Digital genome mapping: ad-vanced binary malware analysis , 2004 .

[20]  Helen J. Wang,et al.  Finding diversity in remote code injection exploits , 2006, IMC '06.

[21]  Rubén Santamarta,et al.  GENERIC DETECTION AND CLASSIFICATION OF POLYMORPHIC MALWARE USING NEURAL PATTERN RECOGNITION , 2006 .

[22]  Wenke Lee,et al.  Ether: malware analysis via hardware virtualization extensions , 2008, CCS.