A COST SENSITIVE LEARNING METHOD TO TUNE THE NEAREST NEIGHBOUR FOR INTRUSION DETECTION

In this paper, a novel cost-sensitive learning algorithm is proposed to improve the performance of the nearest neighbor for intrusion detection. The goal of the learning algorithm is to minimize the total cost in leave-one-out classification of the given training set. This is important since intrusion detection is a problem in which the costs of different misclassifications are not the same. To optimize the nearest neighbor for intrusion detection, the distance function is defined in a parametric form. The free parameters of the distance function (i.e., the weights of features and instances) are adjusted by our proposed feature-weighting and instance-weighting algorithms. The proposed feature-weighting algorithm can be viewed as general purpose wrapper approach for feature weighting. The instance-weighting algorithm is designed to remove noisy and redundant training instances from the training set. This, in turn improves the speed and performance of the nearest neighbor in the generalization phase, which is quite important in real-time applications such as intrusion detection. Using the KDD99 dataset, we show that the scheme is quite effective in designing a cost-sensitive nearest neighbor for intrusion detection.

[1]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[2]  Leon N. Cooper,et al.  Neighborhood size selection in the k-nearest-neighbor rule using statistical confidence , 2006, Pattern Recognit..

[3]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[4]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[5]  Wolfgang Banzhaf,et al.  The use of computational intelligence in intrusion detection systems: A review , 2010, Appl. Soft Comput..

[6]  V. Venkatachalam,et al.  PERFORMANCE COMPARISON OF INTRUSION DETECTION SYSTEM CLASSIFIERS USING VARIOUS FEATURE REDUCTION TECHNIQUES , 2008 .

[7]  Robert Ivor John,et al.  A method of learning weighted similarity function to improve the performance of nearest neighbor , 2009, Inf. Sci..

[8]  Adel Nadjaran Toosi,et al.  A new approach to intrusion detection based on an evolutionary soft computing model using neuro-fuzzy classifiers , 2007, Comput. Commun..

[9]  Salvatore J. Stolfo,et al.  A framework for constructing features and models for intrusion detection systems , 2000, TSEC.

[10]  Keinosuke Fukunaga,et al.  The optimal distance measure for nearest neighbor classification , 1981, IEEE Trans. Inf. Theory.

[11]  Leon N. Cooper,et al.  Improving nearest neighbor rule with a simple adaptive distance measure , 2007, Pattern Recognit. Lett..

[12]  Bernhard Pfahringer,et al.  Winning the KDD99 classification cup: bagged boosting , 2000, SKDD.

[13]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[14]  Lucas M. Venter,et al.  A comparison of Intrusion Detection systems , 2001, Comput. Secur..

[15]  Richard Lippmann,et al.  The 1999 DARPA off-line intrusion detection evaluation , 2000, Comput. Networks.

[16]  Sam Kwong,et al.  Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection , 2007, Pattern Recognition.

[17]  Francesc J. Ferri,et al.  Considerations about sample-size sensitivity of a family of edited nearest-neighbor rules , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[18]  M. H. Sadreddini,et al.  An adaptive nearest neighbor classifier for noisy environments , 2010, 2010 18th Iranian Conference on Electrical Engineering.

[19]  Dimitrios Gunopulos,et al.  Locally Adaptive Metric Nearest-Neighbor Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Carla E. Brodley,et al.  Machine learning techniques for the computer security domain of anomaly detection , 2000 .

[21]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[22]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[23]  Peter Mell,et al.  Intrusion Detection Systems , 2001 .

[24]  Marek Grochowski,et al.  Comparison of Instances Seletion Algorithms I. Algorithms Survey , 2004, ICAISC.

[25]  Ahmed Patel,et al.  A survey of intrusion detection and prevention systems , 2010, Inf. Manag. Comput. Secur..

[26]  Marek Grochowski,et al.  Comparison of Instance Selection Algorithms II. Results and Comments , 2004, ICAISC.

[27]  Tony R. Martinez,et al.  An Integrated Instance‐Based Learning Algorithm , 2000, Comput. Intell..

[28]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  M. Zolghadri Jahromi,et al.  A cost sensitive learning algorithm for intrusion detection , 2010, 2010 18th Iranian Conference on Electrical Engineering.

[30]  Victor S. Sheng,et al.  Cost-Sensitive Learning and the Class Imbalance Problem , 2008 .

[31]  Gürsel Serpen,et al.  Application of Machine Learning Algorithms to KDD Intrusion Detection Dataset within Misuse Detection Context , 2003, MLMTA.

[32]  Charles Elkan,et al.  Results of the KDD'99 classifier learning , 2000, SKDD.

[33]  Shi-Jinn Horng,et al.  A novel intrusion detection system based on hierarchical clustering and support vector machines , 2011, Expert Syst. Appl..

[34]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[35]  Ahmed Bouridane,et al.  Simultaneous feature selection and feature weighting using Hybrid Tabu Search/K-nearest neighbor classifier , 2007, Pattern Recognit. Lett..

[36]  Yuan Zhang,et al.  A Digest and Pattern Matching-Based Intrusion Detection Engine , 2009, Comput. J..

[37]  Luigi Barone,et al.  On XCSR for electronic fraud detection , 2012, Evol. Intell..

[38]  R.K. Cunningham,et al.  Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[39]  Wenke Lee,et al.  Cost-based Modeling and Evaluation for Data Mining With Application to Fraud and Intrusion Detection : Results from the JAM Project ∗ , 2008 .

[40]  Jing Peng,et al.  Adaptive quasiconformal kernel nearest neighbor classification , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Christos Dimitrakakis,et al.  Intrusion detection in MANET using classification algorithms: The effects of cost and model selection , 2013, Ad Hoc Networks.

[42]  Francesco Ricci,et al.  Data Compression and Local Metrics for Nearest Neighbor Classification , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Ying Chen,et al.  Hybrid Intrusion Detection with Weighted Signature Generation over Anomalous Internet Episodes , 2007, IEEE Transactions on Dependable and Secure Computing.

[44]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[45]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[46]  Gürsel Serpen,et al.  Why machine learning algorithms fail in misuse detection on KDD intrusion detection data set , 2004, Intell. Data Anal..

[47]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[48]  Daoqiang Zhang,et al.  Hybrid neural network and C4.5 for misuse detection , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[49]  T. S. Chou,et al.  Network Intrusion Detection Design Using Feature Selection of Soft Computing Paradigms , 2008 .

[50]  Cungen Cao,et al.  An incremental decision tree algorithm based on rough sets and its application in intrusion detection , 2011, Artificial Intelligence Review.

[51]  Fabio Roli,et al.  A Modular Multiple Classifier System for the Detection of Intrusions in Computer Networks , 2003, Multiple Classifier Systems.

[52]  Stefan Axelsson,et al.  Intrusion Detection Systems: A Survey and Taxonomy , 2002 .

[53]  Vipin Kumar,et al.  Anomaly Detection for Discrete Sequences: A Survey , 2012, IEEE Transactions on Knowledge and Data Engineering.

[54]  Enrique Vidal,et al.  Learning weighted metrics to minimize nearest-neighbor classification error , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Andrew J. Clark,et al.  Data preprocessing for anomaly based network intrusion detection: A review , 2011, Comput. Secur..

[56]  Hussein A. Abbass,et al.  Intrusion detection with evolutionary learning classifier systems , 2009, Natural Computing.

[57]  Nasser Yazdani,et al.  Mutual information-based feature selection for intrusion detection systems , 2011, J. Netw. Comput. Appl..

[58]  Itzhak Levin,et al.  KDD-99 classifier learning contest LLSoft's results overview , 2000, SKDD.

[59]  Ramesh C. Agarwal,et al.  PNrule: A New Framework for Learning Classifier Models in Data Mining (A Case-Study in Network Intrusion Detection) , 2001, SDM.

[60]  Salvatore J. Stolfo,et al.  A data mining framework for building intrusion detection models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[61]  Sung-Bae Cho,et al.  Incorporating soft computing techniques into a probabilistic intrusion detection system , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[62]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[63]  Zhi-Hua Zhou,et al.  The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study , 2006, Sixth International Conference on Data Mining (ICDM'06).

[64]  Mohammad Saniee Abadeh,et al.  A parallel genetic local search algorithm for intrusion detection in computer networks , 2007, Eng. Appl. Artif. Intell..