K-Means+ID3: A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods

In this paper, we present "k-means+ID3", a method to cascade k-means clustering and the ID3 decision tree learning methods for classifying anomalous and normal activities in a computer network, an active electronic circuit, and a mechanical mass-beam system. The k-means clustering method first partitions the training instances into k clusters using Euclidean distance similarity. On each cluster, representing a density region of normal or anomaly instances, we build an ID3 decision tree. The decision tree on each cluster refines the decision boundaries by learning the subgroups within the cluster. To obtain a final decision on classification, the decisions of the k-means and ID3 methods are combined using two rules: 1) the nearest-neighbor rule and 2) the nearest-consensus rule. We perform experiments on three data sets: 1) network anomaly data (NAD), 2) Duffing equation data (DED), and 3) mechanical system data (MSD), which contain measurements from three distinct application domains of computer networks, an electronic circuit implementing a forced Duffing equation, and a mechanical system, respectively. Results show that the detection accuracy of the k-means+ID3 method is as high as 96.24 percent at a false-positive-rate of 0.03 percent on NAD; the total accuracy is as high as 80.01 percent on MSD and 79.9 percent on DED

[1]  Connie M. Borror,et al.  Robustness of the Markov-chain model for cyber-attack detection , 2004, IEEE Transactions on Reliability.

[2]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Data Mining Researchers , 2003 .

[3]  Itzhak Levin,et al.  KDD-99 classifier learning contest LLSoft's results overview , 2000, SKDD.

[4]  Jonatan Gómez,et al.  Evolving Fuzzy Classifiers for Intrusion Detection , 2002 .

[5]  R.K. Cunningham,et al.  Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[6]  Christopher Krügel,et al.  Anomaly detection of web-based attacks , 2003, CCS '03.

[7]  David G. Stork,et al.  Pattern Classification , 1973 .

[8]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[9]  RayAsok Symbolic dynamic analysis of complex systems for anomaly detection , 2004 .

[10]  Harold S. Javitz,et al.  The SRI IDES statistical anomaly detector , 1991, Proceedings. 1991 IEEE Computer Society Symposium on Research in Security and Privacy.

[11]  Qiang Chen,et al.  Multivariate Statistical Analysis of Audit Trails for Host-Based Intrusion Detection , 2002, IEEE Trans. Computers.

[12]  Lee M. Rossey,et al.  Extending the DARPA off-line intrusion detection evaluations , 2001, Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01.

[13]  Christopher Krügel,et al.  Anomalous system call detection , 2006, TSEC.

[14]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Antanas Verikas,et al.  Soft combination of neural classifiers: A comparative study , 1999, Pattern Recognit. Lett..

[16]  Richard Lippmann,et al.  The 1999 DARPA off-line intrusion detection evaluation , 2000, Comput. Networks.

[17]  Li Jun,et al.  HIDE: a Hierarchical Network Intrusion Detection System Using Statistical Preprocessing and Neural Network Classification , 2001 .

[18]  V.V. Phoha,et al.  Dimension reduction using feature extraction methods for real-time misuse detection systems , 2004, Proceedings from the Fifth Annual IEEE SMC Information Assurance Workshop, 2004..

[19]  Asok Ray,et al.  Symbolic time series analysis for anomaly detection: A comparative evaluation , 2005, Signal Process..

[20]  Salim Hariri,et al.  A new dependency and correlation analysis for features , 2005, IEEE Transactions on Knowledge and Data Engineering.

[21]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  Marina Thottan,et al.  Anomaly detection in IP networks , 2003, IEEE Trans. Signal Process..

[23]  Dit-Yan Yeung,et al.  Parzen-window network intrusion detectors , 2002, Object recognition supported by user interaction for service robots.

[24]  Ramesh C. Agarwal,et al.  PNrule: A New Framework for Learning Classifier Models in Data Mining (A Case-Study in Network Intrusion Detection) , 2001, SDM.

[25]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[26]  S. T. Sarasamma,et al.  Hierarchical Kohonenen net for anomaly detection in network security , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[27]  Richard Lippmann,et al.  Analysis and Results of the 1999 DARPA Off-Line Intrusion Detection Evaluation , 2000, Recent Advances in Intrusion Detection.

[28]  A. Khatkhate,et al.  Symbolic time-series analysis for anomaly detection in mechanical systems , 2006, IEEE/ASME Transactions on Mechatronics.

[29]  Ludmila I. Kuncheva,et al.  Switching between selection and fusion in combining classifiers: an experiment , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[30]  Asok Ray,et al.  Symbolic dynamic analysis of complex systems for anomaly detection , 2004, Signal Process..

[31]  Eugene H. Spafford,et al.  A PATTERN MATCHING MODEL FOR MISUSE INTRUSION DETECTION , 1994 .