An intelligent PE-malware detection system based on association mining

The proliferation of malware has presented a serious threat to the security of computer systems. Traditional signature-based anti-virus systems fail to detect polymorphic/metamorphic and new, previously unseen malicious executables. Data mining methods such as Naive Bayes and Decision Tree have been studied on small collections of executables. In this paper, resting on the analysis of Windows APIs called by PE files, we develop the Intelligent Malware Detection System (IMDS) using Objective-Oriented Association (OOA) mining based classification. IMDS is an integrated system consisting of three major modules: PE parser, OOA rule generator, and rule based classifier. An OOA_Fast_FP-Growth algorithm is adapted to efficiently generate OOA rules for classification. A comprehensive experimental study on a large collection of PE files obtained from the anti-virus laboratory of KingSoft Corporation is performed to compare various malware detection approaches. Promising experimental results demonstrate that the accuracy and efficiency of our IMDS system outperform popular anti-virus software such as Norton AntiVirus and McAfee VirusScan, as well as previous data mining based detection systems which employed Naive Bayes, Support Vector Machine (SVM) and Decision Tree techniques. Our system has already been incorporated into the scanning tool of KingSoft’s Anti-Virus software.

[1]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[2]  Leonard M. Adleman,et al.  An Abstract Theory of Computer Viruses , 1988, CRYPTO.

[3]  L. M. Adleman,et al.  An abstract theory of computer viruses (invited talk) , 1990, CRYPTO 1990.

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[5]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[6]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[7]  Karl N. Levitt,et al.  MCF: a malicious code filter , 1995, Comput. Secur..

[8]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[9]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[13]  Gary McGraw,et al.  Attacking Malicious Code: A Report to the Infosec Research Council , 2000, IEEE Software.

[14]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[15]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Qiang Yang,et al.  Objective-oriented utility-based association mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[17]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[18]  Somesh Jha,et al.  Static Analysis of Executables to Detect Malicious Patterns , 2003, USENIX Security Symposium.

[19]  Jau-Hwang Wang,et al.  Virus detection using data mining techinques , 2003, IEEE 37th Annual 2003 International Carnahan Conference onSecurity Technology, 2003. Proceedings..

[20]  Fan Ming Mining Frequent Patterns in an FP-tree Without Conditional FP-tree Generation , 2003 .

[21]  Jesse C. Rabek,et al.  Detection of injected, dynamically generated, and obfuscated malicious code , 2003, WORM '03.

[22]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[23]  Mingtian Zhou,et al.  Some Further Theoretical Results about Computer Viruses , 2004, Comput. J..

[24]  Andrew H. Sung,et al.  Static analyzer of vicious executables (SAVE) , 2004, 20th Annual Computer Security Applications Conference.

[25]  A.H. Sung,et al.  Polymorphic malicious executable scanner by API sequence analysis , 2004, Fourth International Conference on Hybrid Intelligent Systems (HIS'04).

[26]  Marcus A. Maloof,et al.  Learning to detect malicious executables in the wild , 2004, KDD.

[27]  Éric Filiol Computer Viruses: from Theory to Applications , 2005 .

[28]  Zhi-hong Zuo,et al.  On the time complexity of computer viruses , 2005, IEEE Transactions on Information Theory.

[29]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Eric Filiol,et al.  Malware Pattern Scanning Schemes Secure Against Black-box Analysis , 2006, Journal in Computer Virology.

[31]  Eric Filiol,et al.  Evaluation methodology and theoretical model for antiviral behavioural detection strategies , 2007, Journal in Computer Virology.

[32]  rey O. Kephart,et al.  Automatic Extraction of Computer Virus SignaturesJe , 2006 .

[33]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[34]  Yanfang Ye,et al.  IMDS: intelligent malware detection system , 2007, KDD '07.

[35]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .