Evaluating indirect and direct classification techniques for network intrusion detection

The network intrusion detection domain has seen increased research that exploit data mining and machine learning techniques and principles. Typically, multi-category classification models are built to classify network traffic instances either as normal or belonging to a specific attack category. While many existing works on data mining in intrusion detection have focused on applying direct classification methods, to our knowledge indirect classification techniques have not been investigated for intrusion detection. In contrast to indirect classification techniques, direct classification techniques generally extend associated binary classifiers to handle multi-category classification problems. An indirect classification technique decomposes (binarization) the original multi-category problem into multiple binary classification problems. The classification technique used to train the set of binary classification problems is called the {base} classifier. Subsequently, a combining strategy is used to merge the results of the binary classifiers. We investigate two binarization techniques and three combining strategies, yielding six indirect classification methods. This study presents a comprehensive comparative study of five direct classification methods with the thirty indirect classification models (six indirect classification models for each of the five base classifiers). To our knowledge, there are no existing works that evaluate as many indirect classification techniques and compare them with direct classification methods, particularly for network intrusion detection. A case study of the DARPA KDD-1999 offline intrusion detection project is used to evaluate the different techniques. It is empirically shown that certain indirect classification techniques yield better network intrusion detection models.

[1]  Taghi M. Khoshgoftaar,et al.  Resource-sensitive intrusion detection models for network traffic , 2004, Eighth IEEE International Symposium on High Assurance Systems Engineering, 2004. Proceedings..

[2]  Salvatore J. Stolfo,et al.  Mining in a data-flow environment: experience in network intrusion detection , 1999, KDD '99.

[3]  John McHugh,et al.  The 1998 Lincoln Laboratory IDS Evaluation , 2000, Recent Advances in Intrusion Detection.

[4]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[5]  Mark J. F. Gales,et al.  Speech Recognition using SVMs , 2001, NIPS.

[6]  Johannes Fürnkranz,et al.  Incremental Reduced Error Pruning , 1994, ICML.

[7]  Alon Orlitsky,et al.  Combined binary classifiers with applications to speech recognition , 2002, INTERSPEECH.

[8]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[9]  Salvatore J. Stolfo,et al.  A data mining framework for building intrusion detection models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[10]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[11]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[12]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[13]  Taghi M. Khoshgoftaar,et al.  Assessment of a New Three-Group Software Quality Classification Technique: An Empirical Case Study , 2005, Empirical Software Engineering.

[14]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[15]  Lee M. Rossey,et al.  Extending the DARPA off-line intrusion detection evaluations , 2001, Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01.

[16]  Richard Lippmann,et al.  The 1999 DARPA off-line intrusion detection evaluation , 2000, Comput. Networks.

[17]  Eleazar Eskin,et al.  A GEOMETRIC FRAMEWORK FOR UNSUPERVISED ANOMALY DETECTION: DETECTING INTRUSIONS IN UNLABELED DATA , 2002 .

[18]  Taghi M. Khoshgoftaar,et al.  Enhancing software quality estimation using ensemble-classifier based noise filtering , 2005, Intell. Data Anal..

[19]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[20]  Johannes Fürnkranz,et al.  Pairwise Classification as an Ensemble Technique , 2002, ECML.

[21]  Ulf Lindqvist,et al.  Detecting computer and network misuse through the production-based expert system toolset (P-BEST) , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[22]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[23]  Andrew H. Sung,et al.  Intrusion detection using neural networks and support vector machines , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[24]  George Saon,et al.  Digit recognition in noisy environments via a sequential GMM/SVM system , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Jonatan Gómez,et al.  Evolving Fuzzy Classifiers for Intrusion Detection , 2002 .

[26]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[27]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[28]  Taghi M. Khoshgoftaar,et al.  Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study , 2004, Empirical Software Engineering.

[29]  Charles Elkan,et al.  Results of the KDD'99 classifier learning , 2000, SKDD.

[30]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[31]  Wei Chu,et al.  Multi-category Classification by Soft-Max Combination of Binary Classifiers , 2003, Multiple Classifier Systems.