Malware detection using assembly and API call sequences

One of the major problems concerning information assurance is malicious code. To evade detection, malware has also been encrypted or obfuscated to produce variants that continue to plague properly defended and patched networks with zero day exploits. With malware and malware authors using obfuscation techniques to generate automated polymorphic and metamorphic versions, anti-virus software must always keep up with their samples and create a signature that can recognize the new variants. Creating a signature for each variant in a timely fashion is a problem that anti-virus companies face all the time. In this paper we present detection algorithms that can help the anti-virus community to ensure a variant of a known malware can still be detected without the need of creating a signature; a similarity analysis (based on specific quantitative measures) is performed to produce a matrix of similarity scores that can be utilized to determine the likelihood that a piece of code under inspection contains a particular malware. Two general malware detection methods presented in this paper are: Static Analyzer for Vicious Executables (SAVE) and Malware Examiner using Disassembled Code (MEDiC). MEDiC uses assembly calls for analysis and SAVE uses API calls (Static API call sequence and Static API call set) for analysis. We show where Assembly can be superior to API calls in that it allows a more detailed comparison of executables. API calls, on the other hand, can be superior to Assembly for its speed and its smaller signature. Our two proposed techniques are implemented in SAVE) and MEDiC. We present experimental results that indicate that both of our proposed techniques can provide a better detection performance against obfuscated malware. We also found a few false positives, such as those programs that use network functions (e.g. PuTTY) and encrypted programs (no API calls or assembly functions are found in the source code) when the thresholds are set 50% similarity measure. However, these false positives can be minimized, for example by changing the threshold value to 70% that determines whether a program falls in the malicious category or not.

[1]  Hervé Debar,et al.  Building an Intrusion-Detection System to Detect Suspicious Process Behavior , 1999, Recent Advances in Intrusion Detection.

[2]  Christian S. Collberg,et al.  Watermarking, Tamper-Proofing, and Obfuscation-Tools for Software Protection , 2002, IEEE Trans. Software Eng..

[3]  W C Wilson,et al.  Activity Pattern Analysis by Means of Sequence-Alignment Methods , 1998 .

[4]  Christian S. Collberg,et al.  A Taxonomy of Obfuscating Transformations , 1997 .

[5]  Stephanie Forrest,et al.  Intrusion Detection Using Sequences of System Calls , 1998, J. Comput. Secur..

[6]  Andrew H. Sung,et al.  Static analyzer of vicious executables (SAVE) , 2004, 20th Annual Computer Security Applications Conference.

[7]  Dae-Ki Kang,et al.  Learning classifiers for misuse and anomaly detection using a bag of system calls representation , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.

[8]  Marc Dacier,et al.  Intrusion Detection Using Variable-Length Audit Trail Patterns , 2000, Recent Advances in Intrusion Detection.

[9]  Thomas Dullien,et al.  Graph-based comparison of Executable Objects , 2005 .

[10]  Somesh Jha,et al.  Static Analysis of Executables to Detect Malicious Patterns , 2003, USENIX Security Symposium.