REFORM: Relevant Features for Malware Analysis

To address the problem of detecting obfuscatedmalware we propose a non-signature based method using machine learning techniques. Mnemonic n-grams from malware and benign samples are extracted. A subset of mnemonic n-gram features are extracted using feature selection methods such as Principal Component Analysis (PCA) and Minimum Redundancy and Maximum Relevance (mRMR). These methods select prominent features that can effectively discriminate malware and benign samples. Promising results with very small features and better accuracies as compared with previous work depict that the proposed method can be effectively used for identifying malicious files.

[1]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[2]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[3]  Marcus A. Maloof,et al.  Learning to detect malicious executables in the wild , 2004, KDD.

[4]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[6]  Ulrich Ultes-Nitsche,et al.  Towards establishing a unknown virus detection technique using SOM , 2006 .

[7]  Nathalie Japkowicz,et al.  A Feature Selection and Evaluation Scheme for Computer Virus Detection , 2006, Sixth International Conference on Data Mining (ICDM'06).

[8]  Ulrich Ultes-Nitsche,et al.  Non-signature based virus detection , 2006, Journal in Computer Virology.

[9]  Daniel Bilar,et al.  Opcodes as predictor for malware , 2007, Int. J. Electron. Secur. Digit. Forensics.

[10]  Andrew Walenstein,et al.  Exploiting Similarity Between Variants to Defeat Malware “ Vilo ” Method for Comparing and Searching Binary Programs , 2007 .

[11]  Nirwan Ansari,et al.  Revealing Packed Malware , 2008, IEEE Security & Privacy.

[12]  Bezawada Bruhadeshwar,et al.  Signature Generation and Detection of Malware Families , 2008, ACISP.

[13]  Lynn Margaret Batten,et al.  Function length as a tool for malware classification , 2008, 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE).

[14]  Yoseba K. Penya,et al.  N-grams-based File Signatures for Malware Detection , 2009, ICEIS.

[15]  Lior Rokach,et al.  Improving malware detection by applying multi-inducer ensemble , 2009, Comput. Stat. Data Anal..

[16]  Muhammad Zubair Shafiq,et al.  Malware detection using statistical analysis of byte-level file content , 2009, CSI-KDD '09.

[17]  Jianyong Dai,et al.  Feature set selection in data mining techniques for unknown virus detection: a comparison study , 2009, CSIIRW '09.

[18]  Yuval Elovici,et al.  Unknown malcode detection and the imbalance problem , 2009, Journal in Computer Virology.

[19]  Chin-Hsiung Wu,et al.  Detecting Unknown Malicious Executables Using Portable Executable Headers , 2009, 2009 Fifth International Joint Conference on INC, IMS and IDC.

[20]  Ronny Merkel,et al.  Statistical Detection of Malicious PE-Executables for Fast Offline Analysis , 2010, Communications and Multimedia Security.

[21]  Yoseba K. Penya,et al.  Idea: Opcode-Sequence-Based Malware Detection , 2010, ESSoS.

[22]  Yanfang Ye,et al.  CIMDS: Adapting Postprocessing Techniques of Associative Classification for Malware Detection , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[23]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.

[24]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .