Deriving common malware behavior through graph clustering

Detection of malicious software (malware) continues to be a problem as hackers devise new ways to evade available methods. The proliferation of malware and malware variants requires new advanced methods to detect them. This paper proposes a method to construct a common behavioral graph representing the execution behavior of a family of malware instances. The method generates one common behavioral graph by clustering a set of individual behavioral graphs, which represent kernel objects and their attributes based on system call traces. The resulting common behavioral graph has a common path, called HotPath, which is observed in all the malware instances in the same family. The proposed method shows high detection rates and false positive rates close to 0%. The derived common behavioral graph is highly scalable regardless of new instances added. It is also robust against system call attacks.

[1]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[2]  Horst Bunke,et al.  Graph Clustering Using the Weighted Minimum Common Supergraph , 2003, GbRPR.

[3]  John C. Mitchell,et al.  Characterizing Bots' Remote Control Behavior , 2007, DIMVA.

[4]  Somesh Jha,et al.  Dynamic Behavior Matching: A Complexity Analysis and New Approximation Algorithms , 2011, CADE.

[5]  Christopher Krügel,et al.  Detecting kernel-level rootkits through binary analysis , 2004, 20th Annual Computer Security Applications Conference.

[6]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[7]  Christopher Krügel,et al.  Exploring Multiple Execution Paths for Malware Analysis , 2007, 2007 IEEE Symposium on Security and Privacy (SP '07).

[8]  Dawson R. Engler,et al.  EXE: automatically generating inputs of death , 2006, CCS '06.

[9]  Nagiza F. Samatova,et al.  The Maximum Common Subgraph Problem: Faster Solutions via Vertex Cover , 2007, 2007 IEEE/ACS International Conference on Computer Systems and Applications.

[10]  Christopher Krügel,et al.  Effective and Efficient Malware Detection at the End Host , 2009, USENIX Security Symposium.

[11]  Qinghua Zhang,et al.  MetaAware: Identifying Metamorphic Malware , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[12]  U. Bayer,et al.  TTAnalyze: A Tool for Analyzing Malware , 2006 .

[13]  Lawrence B. Holder,et al.  Mining Graph Data , 2006 .

[14]  Christopher Krügel,et al.  Behavior-based Spyware Detection , 2006, USENIX Security Symposium.

[15]  Christopher Krügel,et al.  Limits of Static Analysis for Malware Detection , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[16]  Sencun Zhu,et al.  Behavior based software theft detection , 2009, CCS.

[17]  Xiangyu Zhang,et al.  Memory slicing , 2009, ISSTA.

[18]  Christopher Krügel,et al.  Inspector Gadget: Automated Extraction of Proprietary Gadgets from Malware Binaries , 2010, 2010 IEEE Symposium on Security and Privacy.

[19]  Mario Vento,et al.  Challenging Complexity of Maximum Common Subgraph Detection Algorithms: A Performance Analysis of Three Algorithms on a Wide Database of Graphs , 2007, J. Graph Algorithms Appl..

[20]  Somesh Jha,et al.  Testing malware detectors , 2004, ISSTA '04.

[21]  David G. Stork,et al.  Pattern Classification , 1973 .

[22]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[23]  Guofei Gu,et al.  A Taxonomy of Botnet Structures , 2007, ACSAC.

[24]  Christopher Krügel,et al.  Polymorphic Worm Detection Using Structural Information of Executables , 2005, RAID.

[25]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[26]  Heng Yin,et al.  Panorama: capturing system-wide information flow for malware detection and analysis , 2007, CCS '07.

[27]  Alberto Sanfeliu,et al.  Efficient algorithms for matching attributed graphs and function-described graphs , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[28]  Adam Hermans What Is Wild , 2011 .

[29]  Peng Ning,et al.  Learning attack strategies from intrusion alerts , 2003, CCS '03.

[30]  Felix C. Freiling,et al.  Toward Automated Dynamic Malware Analysis Using CWSandbox , 2007, IEEE Secur. Priv..

[31]  Qinghua Zhang,et al.  AntiBot: Clustering Common Semantic Patterns for Bot Detection , 2010, 2010 IEEE 34th Annual Computer Software and Applications Conference.

[32]  Wenke Lee,et al.  Ether: malware analysis via hardware virtualization extensions , 2008, CCS.

[33]  Stefan Katzenbeisser,et al.  Detecting Malicious Code by Model Checking , 2005, DIMVA.

[34]  Alexandros Papanikolaou,et al.  On the Evolution of Malware Species , 2011, ICGS3/e-Democracy.

[35]  Horst Bunke,et al.  Self-organizing map for clustering in the graph domain , 2002, Pattern Recognit. Lett..

[36]  Alberto Sanfeliu,et al.  Synthesis of Function-Described Graphs and Clustering of Attributed Graphs , 2002, Int. J. Pattern Recognit. Artif. Intell..

[37]  David A. Wagner,et al.  Intrusion detection via static analysis , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[38]  Kang G. Shin,et al.  Large-scale malware indexing using function-call graphs , 2009, CCS.

[39]  Zhuoqing Morley Mao,et al.  Automated Classification and Analysis of Internet Malware , 2007, RAID.

[40]  Somesh Jha,et al.  Static Analysis of Executables to Detect Malicious Patterns , 2003, USENIX Security Symposium.

[41]  Somesh Jha,et al.  A Layered Architecture for Detecting Malicious Behaviors , 2008, RAID.

[42]  Jean-Pierre Seifert,et al.  pBMDS: a behavior-based malware detection system for cellphone devices , 2010, WiSec '10.