Weighted Threshold-Based Clustering for Intrusion Detection Systems

Signature-based intrusion detection systems look for known, suspicious patterns in the input data. In this paper we explore compression of labeled empirical data using threshold-based clustering with regularization. The main target of clustering is to compress training dataset to the limited number of signatures, and to minimize the number of comparisons that are necessary to determine the status of the input event as a result. Essentially, the process of clustering includes merging of the clusters which are close enough. As a consequence, we will reduce original dataset to the limited number of labeled centroids. In a complex with k-nearest-neighbor (kNN) method, this set of centroids may be used as a multi-class classifier. Clearly, different attributes have different importance depending on the particular training database and given cost matrix. This importance may be regulated in the definition of the distance using linear weight coefficients. The paper introduces special procedure to estimate above weight coefficients. The experiments on the KDD-99 intrusion detection dataset have confirmed the effectiveness of the proposed methods.

[1]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[2]  Vladimir Nikulin Threshold-based Clustering with Hard Regularization for Intrusion Detection Systems , 2005 .

[3]  Don R. Hush,et al.  A Classification Framework for Anomaly Detection , 2005, J. Mach. Learn. Res..

[4]  Salvatore J. Stolfo,et al.  Anomalous Payload-Based Worm Detection and Signature Generation , 2005, RAID.

[5]  Dit-Yan Yeung,et al.  Parzen-window network intrusion detectors , 2002, Object recognition supported by user interaction for service robots.

[6]  Ramesh C. Agarwal,et al.  PNrule: A New Framework for Learning Classifier Models in Data Mining (A Case-Study in Network Intrusion Detection) , 2001, SDM.

[7]  Fabio A. González,et al.  Anomaly Detection Using Real-Valued Negative Selection , 2003, Genetic Programming and Evolvable Machines.

[8]  Wenke Lee,et al.  Statistical Causality Analysis of INFOSEC Alert Data , 2003, RAID.

[9]  Salvatore J. Stolfo,et al.  Mining Audit Data to Build Intrusion Detection Models , 1998, KDD.

[10]  A. Karr,et al.  Computer Intrusion: Detecting Masquerades , 2001 .

[11]  J. Friedman Clustering objects on subsets of attributes , 2002 .

[12]  Boleslaw K. Szymanski,et al.  Intrusion detection: a bioinformatics approach , 2003, 19th Annual Computer Security Applications Conference, 2003. Proceedings..

[13]  Olfa Nasraoui,et al.  A New Gravitational Clustering Algorithm , 2003, SDM.

[14]  Gürsel Serpen,et al.  Application of Machine Learning Algorithms to KDD Intrusion Detection Dataset within Misuse Detection Context , 2003, MLMTA.

[15]  Vladimir Nikulin Universal Clustering with Family of Power Loss Functions in Probabilistic Space , 2005, IDEAL.

[16]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[17]  Charles Elkan,et al.  Results of the KDD'99 classifier learning , 2000, SKDD.

[18]  Richard Lippmann,et al.  Analysis and Results of the 1999 DARPA Off-Line Intrusion Detection Evaluation , 2000, Recent Advances in Intrusion Detection.

[19]  Alexander J. Smola,et al.  Parametric model-based clustering , 2005, SPIE Defense + Commercial Sensing.

[20]  Philip K. Chan,et al.  An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection , 2003, RAID.

[21]  Bernhard Pfahringer,et al.  Winning the KDD99 classification cup: bagged boosting , 2000, SKDD.

[22]  D. Hand,et al.  A k-nearest-neighbour classifier for assessing consumer credit risk , 1996 .

[23]  Christopher Krügel,et al.  Bayesian event classification for intrusion detection , 2003, 19th Annual Computer Security Applications Conference, 2003. Proceedings..

[24]  Sanjiv K. Bhatia Adaptive K-Means Clustering , 2004, FLAIRS Conference.

[25]  Itzhak Levin,et al.  KDD-99 classifier learning contest LLSoft's results overview , 2000, SKDD.

[26]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[27]  Christopher Krügel,et al.  Using Decision Trees to Improve Signature-Based Intrusion Detection , 2003, RAID.

[28]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[29]  J. Friedman,et al.  Clustering objects on subsets of attributes (with discussion) , 2004 .

[30]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[31]  Sameer Singh,et al.  Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..