Associative classification and post-processing techniques used for malware detection

Numerous attacks made by the malware have presented serious threats to the security of computer users. Unfortunately, along with the development of the malware writing techniques, the number of file samples that need to be analyzed is constantly increasing on a daily basis. An automatic and robust tool to analyze and classify the file samples is the need of the hour. In this paper, resting on the analysis of Windows API execution sequences called by PE files, we use associative classification and post-processing techniques for malware detection. Promising experimental results demonstrate that the accuracy and efficiency of our malware detection method outperform popular anti-virus scanners such as Norton AntiVirus and Dr. Web, as well as previous data mining based detection systems which employed Naive Bayes, Support Vector Machine (SVM) and Decision Tree techniques. In particular, the post-processing techniques we adopt can greatly reduce the number of generated rules which make it easy for the human analysts to identify the useful ones.

[1]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[2]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[3]  Philip S. Yu,et al.  Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[4]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Christopher Krügel,et al.  Dynamic Analysis of Malicious Code , 2006, Journal in Computer Virology.

[7]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[8]  Andrew H. Sung,et al.  Static analyzer of vicious executables (SAVE) , 2004, 20th Annual Computer Security Applications Conference.

[9]  Eric Filiol,et al.  Malware Pattern Scanning Schemes Secure Against Black-box Analysis , 2006, Journal in Computer Virology.

[10]  Jianyong Wang,et al.  HARMONY: Efficiently Mining the Best Rules for Classification , 2005, SDM.

[11]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[12]  Marcus A. Maloof,et al.  Learning to detect malicious executables in the wild , 2004, KDD.

[13]  Jau-Hwang Wang,et al.  Virus detection using data mining techinques , 2003, IEEE 37th Annual 2003 International Carnahan Conference onSecurity Technology, 2003. Proceedings..

[14]  Wynne Hsu,et al.  Pruning and summarizing the discovered associations , 1999, KDD '99.

[15]  Heikki Mannila,et al.  Pruning and grouping of discovered association rules , 1995 .

[16]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[17]  Sanjay Chawla,et al.  CCCS: a top-down associative classifier for imbalanced class distribution , 2006, KDD '06.

[18]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[19]  Yanfang Ye,et al.  IMDS: intelligent malware detection system , 2007, KDD '07.

[20]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[21]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[22]  Peter I. Cowling,et al.  MMAC: a new multi-class, multi-label associative classification approach , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[23]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[24]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[25]  Eric Filiol,et al.  Evaluation methodology and theoretical model for antiviral behavioural detection strategies , 2007, Journal in Computer Virology.

[26]  Tao Li,et al.  An intelligent PE-malware detection system based on association mining , 2008, Journal in Computer Virology.

[27]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[28]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[29]  Frans Coenen,et al.  A Novel Rule Weighting Approach in Classification Association Rule Mining , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[30]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[31]  Fadi A. Thabtah,et al.  A review of associative classification mining , 2007, The Knowledge Engineering Review.

[32]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.