Malware Variant Detection Using Similarity Search over Sets of Control Flow Graphs

Static detection of polymorphic malware variants plays an important role to improve system security. Control flow has shown to be an effective characteristic that represents polymorphic malware instances. In our research, we propose a similarity search of malware using novel distance metrics of malware signatures. We describe a malware signature by the set of control flow graphs the malware contains. We propose two approaches and use the first to perform pre-filtering. Firstly, we use a distance metric based on the distance between feature vectors. The feature vector is a decomposition of the set of graphs into either fixed size k-sub graphs, or q-gram strings of the high-level source after decompilation. We also propose a more effective but less computationally efficient distance metric based on the minimum matching distance. The minimum matching distance uses the string edit distances between programs' decompiled flow graphs, and the linear sum assignment problem to construct a minimum sum weight matching between two sets of graphs. We implement the distance metrics in a complete malware variant detection system. The evaluation shows that our approach is highly effective in terms of a limited false positive rate and our system detects more malware variants when compared to the detection rates of other algorithms.

[1]  Guillaume Bonfante,et al.  Morphological detection of malware , 2008, 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE).

[2]  Kang G. Shin,et al.  Large-scale malware indexing using function-call graphs , 2009, CCS.

[3]  Christopher Krügel,et al.  Polymorphic Worm Detection Using Structural Information of Executables , 2005, RAID.

[4]  Leon Moonen,et al.  Proceedings of the Sixth IEEE International Workshop on Source Code Analysis and Manipulation , 2006 .

[5]  Tzi-cker Chiueh,et al.  Automatic Generation of String Signatures for Malware Detection , 2009, RAID.

[6]  Gran Vía,et al.  GRAPHS, ENTROPY AND GRID COMPUTING: AUTOMATIC COMPARISON OF MALWARE , 2008 .

[7]  Lori A. Flynn,et al.  Polymorphic malware detection and identification via context-free grammar homomorphism , 2007, Bell Labs Technical Journal.

[8]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[9]  Somesh Jha,et al.  Static Analysis of Executables to Detect Malicious Patterns , 2003, USENIX Security Symposium.

[10]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[11]  Andrew Walenstein,et al.  Normalizing Metamorphic Malware Using Term Rewriting , 2006, 2006 Sixth IEEE International Workshop on Source Code Analysis and Manipulation.

[12]  S. Katzenbeisser,et al.  Malware Normalization , 2005 .

[13]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[14]  Yang Xiang,et al.  A Fast Flowgraph Based Classification System for Packed and Polymorphic Malware on the Endhost , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[15]  T. Dullien,et al.  Graph-based comparison of Executable Objects ( English Version ) , 2005 .

[16]  Yang Xiang,et al.  Classification of malware using structured control flow , 2010 .

[17]  Cristina Cifuentes,et al.  Reverse compilation techniques , 1994 .

[18]  Marcus A. Maloof,et al.  Learning to detect malicious executables in the wild , 2004, KDD.

[19]  Yanfang Ye,et al.  IMDS: intelligent malware detection system , 2007, KDD '07.

[20]  Marcos R. Vieira,et al.  DBM-Tree: A Dynamic Metric Access Method Sensitive to Local Density Data , 2010, J. Inf. Data Manag..

[21]  Enrique V. Carrera,et al.  Digital genome mapping: ad-vanced binary malware analysis , 2004 .

[22]  Wenke Lee,et al.  PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[23]  rey O. Kephart,et al.  Automatic Extraction of Computer Virus SignaturesJe , 2006 .

[24]  Peter Martini,et al.  Classification and detection of metamorphic malware using value set analysis , 2009, 2009 4th International Conference on Malicious and Unwanted Software (MALWARE).

[25]  Petteri Kaski,et al.  Engineering an Efficient Canonical Labeling Tool for Large and Sparse Graphs , 2007, ALENEX.

[26]  Andrew Walenstein,et al.  Malware phylogeny generation using permutations of code , 2005, Journal in Computer Virology.

[27]  Christopher Krügel,et al.  Static Disassembly of Obfuscated Binaries , 2004, USENIX Security Symposium.

[28]  Lori A. Flynn,et al.  Polymorphic malware detection and identification via context-free grammar homomorphism , 2007 .

[29]  Akito Monden,et al.  Dynamic Software Birthmarks to Detect the Theft of Windows Applications , 2004 .

[30]  Wenke Lee,et al.  McBoost: Boosting Scalability in Malware Collection and Analysis Using Statistical Classification of Executables , 2008, 2008 Annual Computer Security Applications Conference (ACSAC).

[31]  Stefan Brecheisen,et al.  Efficient and effective similarity search on complex objects , 2007 .

[32]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[33]  Daniel Bilar,et al.  Opcodes as predictor for malware , 2007, Int. J. Electron. Secur. Digit. Forensics.

[34]  Marius Gheorghescu AN AUTOMATED VIRUS CLASSIFICATION SYSTEM , 2006 .