Rootkit (malicious code) prediction through data mining methods and techniques

Rootkits refer to software that is used to hide the presence and activity of malware and permit an attacker to take control of a computer system by affecting the kernel. This paper explores the application of data mining methods to predict rootkits based on the attributes extracted from the information contained in the log files. The rootkit records were categorized as Inline and Others based on the attribute values. Nine classification algorithm were investigated to identify the most accurate and efficient classifier for rootkit prediction. The Correlation Bayes algorithm was found to attain the maximum level of prediction accuracy (87.4%) through 10-fold cross-validation. Moreover, inorder to affirm the performance of the algorithm on unbalanced data, the Mathews Correlation Coefficient was also calculated. The Correlation Bayes algorithm yielded the highest MCC of 0.679 on the Rootkit dataset.

[1]  R. Geetha Ramani,et al.  Mining of classification patterns in clinical data through data mining algorithms , 2012, ICACCI '12.

[2]  Zhuoqing Morley Mao,et al.  Automated Classification and Analysis of Internet Malware , 2007, RAID.

[3]  Shomona Gracia Jacob,et al.  Prediction of cancer rescue p53 mutants in silico using Naïve Bayes learning methodology. , 2013, Protein and peptide letters.

[4]  R. Geetha Ramani,et al.  Data Mining in Clinical Data Sets: A Review , 2012 .

[5]  Paul A. Watters,et al.  RBACS: Rootkit Behavioral Analysis and Classification System , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[6]  R. Geetha Ramani,et al.  Improved Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins Using Data Mining Models , 2013, PloS one.

[7]  Aaron Emigh The Crimeware Landscape: Malware, Phishing, Identity Theft and Beyond , 2006, J. Digit. Forensic Pract..

[8]  R. Geetha Ramani,et al.  Discovery of Knowledge Patterns in Clinical Data through Data Mining Algorithms: Multi-class Categorization of Breast Tissue Data , 2011 .

[9]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[10]  Shomona Gracia Jacob,et al.  Prediction of P53 Mutants (Multiple Sites) Transcriptional Activity Based on Structural (2D&3D) Properties , 2013, PloS one.

[11]  R. Geetha Ramani,et al.  EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DATA THROUGH DATA MINING TECHNIQUES , 2012 .

[12]  Shomona Gracia Jacob,et al.  Benchmarking Classification Models for Cancer Prediction from Gene Expression Data: A Novel Approach and New Findings , 2013 .

[13]  Partha Dasgupta,et al.  Kernel and Application Integrity Assurance: Ensuring Freedom from Rootkits and Malware in a Computer System , 2007, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07).

[14]  R. Geetha Ramani,et al.  Gender specific classification of road accident patterns through data mining techniques , 2012, IEEE-International Conference On Advances In Engineering, Science And Management (ICAESM -2012).

[15]  R. Geetha Ramani,et al.  Classifier prediction evaluation in modeling road traffic accident data , 2012, 2012 IEEE International Conference on Computational Intelligence and Computing Research.

[16]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[17]  Desmond Lobo,et al.  A New Procedure to Help System/Network Administrators Identify Multiple Rootkit Infections , 2010, 2010 Second International Conference on Communication Software and Networks.

[18]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[19]  R. Geetha Ramani,et al.  Feature Relevance Analysis and Classification of Road Traffic Accident Data through Data Mining Techniques , 2012 .

[20]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[21]  Mika Stahlberg,et al.  HIDE 'N SEEK REVISITED - FULL STEALTH IS BACK , 2005 .

[22]  Carsten Willems,et al.  Learning and Classification of Malware Behavior , 2008, DIMVA.

[23]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[24]  Robert Lyda,et al.  Using Entropy Analysis to Find Encrypted and Packed Malware , 2007, IEEE Security & Privacy.

[25]  R. Geetha Ramani,et al.  Data Mining Techniques for Automatic Recognition of Carnatic Raga Swaram Notes , 2012 .

[26]  Jinyuan You,et al.  CLOPE: a fast and effective clustering algorithm for transactional data , 2002, KDD.

[27]  M. Siddiqui,et al.  Detecting Internet Worms Using Data Mining Techniques , 2008 .

[28]  Shomona Gracia Jacob,et al.  Feature Relevance Analysis and Classification of Parkinson Disease Tele-Monitoring Data Through Data Mining Techniques , 2012 .

[29]  Zhenkai Liang,et al.  HookFinder: Identifying and Understanding Malware Hooking Behaviors , 2008, NDSS.

[30]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .