Malware Analysis and attribution using Genetic Information

As organizations become ever more dependent on networked operations, they are increasingly vulnerable to attack by a variety of attackers, including criminals, terrorists and nation states using cyber attacks. New malware attacks, including viruses, Trojans, and worms, are constantly and rapidly emerging threats. However, attackers often reuse code and techniques from previous attacks. Both by recognizing the reused elements from previous attacks and by detecting patterns in the types of modification and reuse observed, we can more rapidly develop defenses, make hypotheses about the source of the malware, and predict and prepare to defend against future attacks. We achieve these objectives in Malware Analysis and Attribution using Genetic Information (MAAGI) by adapting and extending concepts from biology and linguistics. First, analyzing the “genetics” of malware (i.e., reverse engineered representations of the original program) provides critical information about the program that is not available by looking only at the executable program. Second, the evolutionary process of malware (i.e., the transformation from one species of malware to another) can provide insights into the ancestry of malware, characteristics of the attacker, and where future attacks might come from and what they might look like. Third, functional linguistics is the study of the intent behind communicative acts; its application to malware characterization can support the study of the intent behind malware behaviors. To this point in the program, we developed a system that uses a range of reverse engineering techniques, including static, dynamic, behavioral, and functional analysis that clusters malware into families. We are also able to determine the malware lineage in some situations. Using behavioral and functional analysis, we are also able to identify a number of functions and purposes of malware.

[1]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Jonathon T. Giffin,et al.  Automatic Reverse Engineering of Malware Emulators , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[3]  Carsten Willems,et al.  Automatic analysis of malware behavior using machine learning , 2011, J. Comput. Secur..

[4]  Karl N. Levitt,et al.  MCF: a malicious code filter , 1995, Comput. Secur..

[5]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[6]  Wenke Lee,et al.  Ether: malware analysis via hardware virtualization extensions , 2008, CCS.

[7]  Cynthia A. Phillips,et al.  Constructing Computer Virus Phylogenies , 1996, J. Algorithms.

[8]  Christopher Krügel,et al.  Exploring Multiple Execution Paths for Malware Analysis , 2007, 2007 IEEE Symposium on Security and Privacy (SP '07).

[9]  Enrique V. Carrera,et al.  Digital genome mapping: ad-vanced binary malware analysis , 2004 .

[10]  Wenke Lee,et al.  PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).