Data Mining Approach for Developing Various Models Based on Types of Attack and Feature Selection as Intrusion Detection Systems (IDS)

Information security is one of the important issues to protect data or information from unauthorized access. Classification techniques play very important role in information security to classify data as legitimate or normal data. Nowadays, network traffic includes large amount of irrelevant information that increases complexity of classifier and affect the classification result, so we need to develop robust model that can classify the data with high accuracy. In this paper, various types of classification techniques are applied on NSL-KDD data with Tenfold cross-validation technique in two different viewpoints. First, the classification techniques are applied for two class problem as binary classification (normal and attack), and second, it is applied for five class problem as multiclass classification. Empirical result shows that random forest technique outperforms in case of two class problem as well as five class problem on NSL-KDD data set. Due to large amount of redundant data, we have also applied feature selection techniques on random forest tree model which is best model as binary classifier as well as multiclass classifier. Model produces highest accuracy with 15 features in case of binary classification. Performance of the various models are also evaluated using other performance measures like true-positive rate (TPR), false-positive rate (FPR), precision, F-measure and receiver operating characteristic (ROC) curve and the results are found to be satisfactory.