Malware detection with quantitative data flow graphs

We propose a novel behavioral malware detection approach based on a generic system-wide quantitative data flow model. We base our data flow analysis on the incremental construction of aggregated quantitative data flow graphs. These graphs represent communication between different system entities such as processes, sockets, files or system registries. We demonstrate the feasibility of our approach through a prototypical instantiation and implementation for the Windows operating system. Our experiments yield encouraging results: in our data set of samples from common malware families and popular non-malicious applications, our approach has a detection rate of 96% and a false positive rate of less than 1.6%. In comparison with closely related data flow based approaches, we achieve similar detection effectiveness with considerably better performance: an average full system analysis takes less than one second.

[1]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[2]  Philip K. Chan,et al.  Learning Patterns from Unix Process Execution Traces for Intrusion Detection , 1997 .

[3]  Michael Schatz,et al.  Learning Program Behavior Profiles for Intrusion Detection , 1999, Workshop on Intrusion Detection and Network Monitoring.

[4]  Stefan Axelsson,et al.  The base-rate fallacy and the difficulty of intrusion detection , 2000, TSEC.

[5]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[6]  Samuel T. King,et al.  Backtracking intrusions , 2005, TOCS.

[7]  Peter Szor,et al.  The Art of Computer Virus Research and Defense , 2005 .

[8]  Christopher Krügel,et al.  Behavior-based Spyware Detection , 2006, USENIX Security Symposium.

[9]  S. Jha,et al.  Mining specifications of malicious behavior , 2007, ESEC-FSE '07.

[10]  Heng Yin,et al.  Panorama: capturing system-wide information flow for malware detection and analysis , 2007, CCS '07.

[11]  Christopher Krügel,et al.  Detecting System Emulators , 2007, ISC.

[12]  Ludovic Mé,et al.  Code obfuscation techniques for metamorphic viruses , 2008, Journal in Computer Virology.

[13]  Jonathon T. Giffin,et al.  Impeding Malware Analysis Using Conditional Code Obfuscation , 2008, NDSS.

[14]  Somesh Jha,et al.  A semantics-based approach to malware detection , 2008, TOPL.

[15]  Christopher Krügel,et al.  A survey on automated dynamic malware-analysis techniques and tools , 2012, CSUR.

[16]  Christopher Krügel,et al.  Effective and Efficient Malware Detection at the End Host , 2009, USENIX Security Symposium.

[17]  Heejo Lee,et al.  Detecting metamorphic malwares using code graphs , 2010, SAC '10.

[18]  Somesh Jha,et al.  A Declarative Framework for Intrusion Analysis , 2010, Cyber Situational Awareness.

[19]  Kangbin Yim,et al.  Malware Obfuscation Techniques: A Brief Survey , 2010, 2010 International Conference on Broadband, Wireless Computing, Communication and Applications.

[20]  Somesh Jha,et al.  Synthesizing Near-Optimal Malware Specifications from Suspicious Behaviors , 2010, 2010 IEEE Symposium on Security and Privacy.

[21]  Christopher Krügel,et al.  AccessMiner: using system-centric models for malware protection , 2010, CCS '10.

[22]  Carsten Willems,et al.  Automatic analysis of malware behavior using machine learning , 2011, J. Comput. Secur..

[23]  Somesh Jha,et al.  Dynamic Behavior Matching: A Complexity Analysis and New Approximation Algorithms , 2011, CADE.

[24]  Kieran McLaughlin,et al.  Obfuscation: The Hidden Malware , 2011, IEEE Security & Privacy.

[25]  Barbara G. Ryder,et al.  User-Centric Dependence Analysis For Identifying Malicious Mobile Apps , 2012 .

[26]  Herbert Bos,et al.  Large-Scale Analysis of Malware Downloaders , 2012, DIMVA.

[27]  Alexander Pretschner,et al.  Data Loss Prevention Based on Data-Driven Usage Control , 2012, 2012 IEEE 23rd International Symposium on Software Reliability Engineering.

[28]  Mark Stamp,et al.  Deriving common malware behavior through graph clustering , 2013, Comput. Secur..

[29]  Somesh Jha,et al.  Synthesizing near-optimal malware specifications from suspicious behaviors , 2013, 2013 8th International Conference on Malicious and Unwanted Software: "The Americas" (MALWARE).

[30]  Konrad Rieck,et al.  A close look on n-grams in intrusion detection: anomaly detection vs. classification , 2013, AISec.

[31]  Jiankun Hu,et al.  A Semantic Approach to Host-Based Intrusion Detection Systems Using Contiguousand Discontiguous System Call Patterns , 2014, IEEE Transactions on Computers.