Performance Analysis of Multiple Classifiers on KDD Cup Dataset using WEKA Tool

Background/Objectives: The objective of this work is to find the best among the ten classification algorithms considered to classify the connection records into normal or abnormal in the KDDCup20% training data set using WEKA tool. Methods/ Statistical Analysis: In this work, the experiment is carried out by the application of 10 classification algorithms on the KDDCup 20% training dataset comprising of 25192 instances through an experiment type of 10-fold cross validation. The tests were configured with Paired T Tester (corrected) and the level in the test of significance was taken as 0.05. The comparison fields Percent_correct, fmeasure, irrecall, irprecision and auc (area under roc) were taken for evaluation. Tests were also performed for ranking and summary. Findings: As per the results obtained by the Weka Experimenter with the 10 classifiers on the KDD 20% training dataset, it has been analysed that Random forest classifier works best with the comparison fields percent_correct, fmeasure and AUC (Area under ROC). Simplecart classifier ranks next to Randomforest classifier with the comparison fields percent_correct and measure. Simplecart classifier outperforms all other classifiers with respect to the comparison field irprecision. ZeroR is found to be the worst classifier in terms of all the comparison fields other than irrecall. Thus it has been found that with the dataset that is taken for experiment, further detailed study could be restricted only with the five classifiers namely Random Forest, Simple cart, J48, Bagging and IBk. This will definitely reduce computational time and increase the efficiency of classification of the KDDCup20% data set.