Data Dimensionality Reduction (DDR) Scheme for Intrusion Detection System Using Ensemble and Standalone Classifiers

The growth in IT sector is touching new pinnacles day by day, and hence the number of devices that are connected through Internet have increased tremendously, resulting into Big Data issue, more computation time and an increased rate of malicious activities. Thus, to provide more security, Intrusion Detection System (IDS) were introduced which played a major role in the past few years, when it comes to security. With an intent to develop a more efficient IDS, one needs to explore several Data Mining Strategies in the domain of Data Analytics. While consulting the domain of Data Analytics one fundamental problem that is encountered is high dimensional data. Hence, for reducing the dimensions of data a Data Dimensionality Reduction Scheme has been proposed which minimizes the number of features, dimensions and tuples in the Training set in order to increase detection rates for IDS. The scheme proposed has been evaluated with two approaches - Ensemble approach and the Standalone Classifier approach. The dataset used for the experiment is benchmark dataset NSL-KDD and latest intrusion dataset CICIDS 2017.

[1]  Shadi Aljawarneh,et al.  Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model , 2017, J. Comput. Sci..

[2]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[3]  Dan Wang,et al.  An Effective Feature Selection Approach for Network Intrusion Detection , 2013, 2013 IEEE Eighth International Conference on Networking, Architecture and Storage.

[4]  Yue Zhang,et al.  Optimal combination of feature selection and classification via local hyperplane based learning strategy , 2015, BMC Bioinformatics.

[5]  S. O’Brien,et al.  Evaluation and Integration of Genetic Signature for Prediction Risk of Nasopharyngeal Carcinoma in Southern China , 2014, BioMed research international.

[6]  Wei Cheng,et al.  Fast and robust group-wise eQTL mapping using sparse graphical models , 2015, BMC Bioinformatics.

[7]  Xu Zhang,et al.  Efficient classification using parallel and scalable compressed model and its application on intrusion detection , 2014, Expert Syst. Appl..

[8]  Duílio A. N. S. Silva,et al.  An instance selection method for large datasets based on Markov Geometric Diffusion , 2016, Data Knowl. Eng..

[9]  Karen A. Scarfone,et al.  Guide to Intrusion Detection and Prevention Systems (IDPS) , 2007 .

[10]  Nalini Priya Ganapathi,et al.  A Knowledgeable Feature Selection Based on Set Theory for Web Intrusion Detection System , 2015 .

[11]  Ali A. Ghorbani,et al.  Towards a Reliable Intrusion Detection Benchmark Dataset , 2017 .

[12]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[13]  Kwangjo Kim,et al.  Machine-Learning-Based Feature Selection Techniques for Large-Scale Network Intrusion Detection , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems Workshops (ICDCSW).

[14]  Raudel Hernández-León,et al.  A data reduction strategy and its application on scan and backscatter detection using rule-based classifiers , 2018, Expert Syst. Appl..

[15]  Bayu Adhi Tama,et al.  An in-depth experimental study of anomaly detection using gradient boosted machine , 2017, Neural Computing and Applications.

[16]  Neelam Sharma,et al.  INTRUSION DETECTION USING NAIVE BAYES CLASSIFIER WITH FEATURE REDUCTION , 2012 .

[17]  Taghi M. Khoshgoftaar,et al.  Intrusion detection and Big Heterogeneous Data: a Survey , 2015, Journal of Big Data.

[18]  Neeraj Kumar,et al.  A feature reduced intrusion detection system using ANN classifier , 2017, Expert Syst. Appl..

[19]  Luca Dieci,et al.  Continuation of invariant subspaces , 2001, Numer. Linear Algebra Appl..

[20]  Taghi M. Khoshgoftaar,et al.  Evolutionary Optimization of Software Quality Modeling with Multiple Repositories , 2010, IEEE Transactions on Software Engineering.

[21]  Vijay Kumar Jha,et al.  Data Mining based Hybrid Intrusion Detection System , 2014 .

[22]  Niki Pissinou,et al.  Correlation-Based Feature Selection for Intrusion Detection Design , 2007, MILCOM 2007 - IEEE Military Communications Conference.

[23]  Jingbo Xia,et al.  A Novel Feature Selection Strategy for Enhanced Biomedical Event Extraction Using the Turku System , 2014, BioMed research international.

[24]  Kristin P. Bennett,et al.  Support vector machines: hype or hallelujah? , 2000, SKDD.

[25]  Charu C. Aggarwal,et al.  Data Mining: The Textbook , 2015 .

[26]  P. Greenwood,et al.  A Guide to Chi-Squared Testing , 1996 .

[27]  Francisco Herrera,et al.  Data Preprocessing in Data Mining , 2014, Intelligent Systems Reference Library.

[28]  George D. C. Cavalcanti,et al.  Choosing instance selection method using meta-learning , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[29]  Ali A. Ghorbani,et al.  Toward developing a systematic approach to generate benchmark datasets for intrusion detection , 2012, Comput. Secur..

[30]  Chou-Yuan Lee,et al.  An intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection , 2012, Appl. Soft Comput..

[31]  Miad Faezipour,et al.  Features Dimensionality Reduction Approaches for Machine Learning Based Network Intrusion Detection , 2019, Electronics.

[32]  Ali Harounabadi,et al.  Feature Ranking in Intrusion Detection Dataset using Combination of Filtering Methods , 2013 .