Malware Clustering Using Family Dependency Graph

Malware brings a major security threat on the Internet today. It is not surprising that much research has concentrated on detecting malware. Unfortunately, the current malware detection approaches suffer from ineffective detection of new malware samples. These models effectively identify the known malware samples but not new variants. To address this issue, we propose a novel malware detection approach based on the family graph. First, we trace the API calls of the monitored application, and then we generate the dependency graph based on the dependency relationship of the API calls. At last, we construct the family dependency graph via clustering the graphs of a known malware family. In this way, we can determine whether a new sample belongs to a known malware family. The evaluation results show that our approach is effective with small overhead compared to other existing approaches.

[1]  Christopher Krügel,et al.  Effective and Efficient Malware Detection at the End Host , 2009, USENIX Security Symposium.

[2]  J. J. McGregor,et al.  Backtrack search algorithms and the maximal common subgraph problem , 1982, Softw. Pract. Exp..

[3]  EMMANOUIL VASILOMANOLAKIS,et al.  Taxonomy and Survey of Collaborative Intrusion Detection , 2015, ACM Comput. Surv..

[4]  Alexander Pretschner,et al.  A framework for empirical evaluation of malware detection resilience against behavior obfuscation , 2015, 2015 10th International Conference on Malicious and Unwanted Software (MALWARE).

[5]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[6]  Alexander Pretschner,et al.  Robust and Effective Malware Detection Through Quantitative Data Flow Graph Metrics , 2015, DIMVA.

[7]  Christopher Krügel,et al.  EdgeMiner: Automatically Detecting Implicit Control Flow Transitions through the Android Framework , 2015, NDSS.

[8]  Debin Gao,et al.  Gray-box extraction of execution graphs for anomaly detection , 2004, CCS '04.

[9]  Christian Dietrich,et al.  Cross-Kernel Control-Flow--Graph Analysis for Event-Driven Real-Time Systems , 2015, LCTES.

[10]  R. Sekar,et al.  A fast automaton-based method for detecting anomalous program behaviors , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[11]  Kaspar Riesen,et al.  A Novel Software Toolkit for Graph Edit Distance Computation , 2013, GbRPR.

[12]  Somesh Jha,et al.  Synthesizing Near-Optimal Malware Specifications from Suspicious Behaviors , 2010, 2010 IEEE Symposium on Security and Privacy.

[13]  Stephen McCamant,et al.  HI-CFG: Construction by Binary Analysis and Application to Attack Polymorphism , 2013, ESORICS.

[14]  David A. Wagner,et al.  Control-Flow Bending: On the Effectiveness of Control-Flow Integrity , 2015, USENIX Security Symposium.

[15]  Roland H. C. Yap,et al.  Improving Host-Based IDS with Argument Abstraction to Prevent Mimicry Attacks , 2005, RAID.

[16]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[17]  Will Dietz,et al.  Nested Kernel: An Operating System Architecture for Intra-Kernel Privilege Separation , 2015, ASPLOS.

[18]  R. Sekar,et al.  Dataflow anomaly detection , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[19]  Salvatore J. Stolfo,et al.  Learning Rules from System Call Arguments and Sequences for Anomaly 20 Detection , 2003 .

[21]  Douglas S. Reeves,et al.  Deriving common malware behavior through graph clustering , 2011, ASIACCS '11.

[22]  Kang G. Shin,et al.  Large-scale malware indexing using function-call graphs , 2009, CCS.

[23]  Aziz Mohaisen,et al.  AMAL: High-Fidelity, Behavior-Based Automated Malware Analysis and Classification , 2014, WISA.

[24]  Horst Bunke,et al.  Graph Clustering Using the Weighted Minimum Common Supergraph , 2003, GbRPR.

[25]  Horst Bunke,et al.  A graph distance metric based on the maximal common subgraph , 1998, Pattern Recognit. Lett..