SVM Training Phase Reduction Using Dataset Feature Filtering for Malware Detection

N-gram analysis is an approach that investigates the structure of a program using bytes, characters, or text strings. A key issue with N-gram analysis is feature selection amidst the explosion of features that occurs when N is increased. The experiments within this paper represent programs as operational code (opcode) density histograms gained through dynamic analysis. A support vector machine is used to create a reference model, which is used to evaluate two methods of feature reduction, which are “area of intersect” and “subspace analysis using eigenvectors.” The findings show that the relationships between features are complex and simple statistics filtering approaches do not provide a viable approach. However, eigenvector subspace analysis produces a suitable filter.

[1]  Yoseba K. Penya,et al.  N-grams-based File Signatures for Malware Detection , 2009, ICEIS.

[2]  Yuval Elovici,et al.  Unknown Malcode Detection Using OPCODE Representation , 2008, EuroISI.

[3]  James Newsome,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and SignatureGeneration of Exploits on Commodity Software , 2005, NDSS.

[4]  Guillaume Bonfante,et al.  Control Flow Graphs as Malware Signatures , 2007 .

[5]  Stephanie Forrest,et al.  Intrusion Detection Using Sequences of System Calls , 1998, J. Comput. Secur..

[6]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[7]  Angel R. Martinez,et al.  MATLAB Statistics Toolbox , 2001 .

[8]  Vijay Laxmi,et al.  Static CFG analyzer for metamorphic Malware code , 2009, SIN '09.

[9]  R. Sekar,et al.  A fast automaton-based method for detecting anomalous program behaviors , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[10]  Qinghua Zhang,et al.  MetaAware: Identifying Metamorphic Malware , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[11]  Christopher Krügel,et al.  Effective and Efficient Malware Detection at the End Host , 2009, USENIX Security Symposium.

[12]  Robert J. Vanderbei,et al.  Linear Programming: Foundations and Extensions , 1998, Kluwer international series in operations research and management service.

[13]  Xu Chen,et al.  Towards an understanding of anti-virtualization and anti-debugging behavior in modern malware , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[14]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[15]  N. Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods: Kernel-Induced Feature Spaces , 2000 .

[16]  Arun Lakhotia,et al.  A method for detecting obfuscated calls in malicious binaries , 2005, IEEE Transactions on Software Engineering.

[17]  Engin Kirda,et al.  A View on Current Malware Behaviors , 2009, LEET.

[18]  Karl N. Levitt,et al.  Execution monitoring of security-critical programs in distributed systems: a specification-based approach , 1997, Proceedings. 1997 IEEE Symposium on Security and Privacy (Cat. No.97CB36097).

[19]  Ke Wang,et al.  Fileprints: identifying file types by n-gram analysis , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.

[20]  Jesse C. Rabek,et al.  Detection of injected, dynamically generated, and obfuscated malicious code , 2003, WORM '03.

[21]  Shigeki Goto,et al.  A new intrusion detection method based on process profiling , 2002, Proceedings 2002 Symposium on Applications and the Internet (SAINT 2002).

[22]  Igor Santos,et al.  Opcode sequences as representation of executables for data-mining-based unknown malware detection , 2013, Inf. Sci..

[23]  Yoseba K. Penya,et al.  Idea: Opcode-Sequence-Based Malware Detection , 2010, ESSoS.

[24]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[25]  Daniel R. Ellis,et al.  A behavioral approach to worm detection , 2004, WORM '04.

[26]  Igor Santos,et al.  Using opcode sequences in single-class learning to detect unknown malware , 2011, IET Inf. Secur..

[27]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[28]  Salvatore J. Stolfo,et al.  On the infeasibility of modeling polymorphic shellcode , 2009, Machine Learning.

[29]  Daniel Bilar,et al.  Opcodes as predictor for malware , 2007, Int. J. Electron. Secur. Digit. Forensics.

[30]  Yuval Elovici,et al.  Detecting unknown malicious code by applying classification techniques on OpCode patterns , 2012, Security Informatics.