Handling nominal features in anomaly intrusion detection problems

Computer network data stream used in intrusion detection usually involve many data types. A common data type is that of symbolic or nominal features. Whether being coded into numerical values or not, nominal features need to be treated differently from numeric features. This paper studies the effectiveness of two approaches in handling nominal features: a simple coding scheme via the use of indicator variables and a scaling method based on multiple correspondence analysis (MCA). In particular, we apply the techniques with two anomaly detection methods: the principal component classifier (PCC) and the Canberra metric. The experiments with KDD 1999 data demonstrate that MCA works better than the indicator variable approach for both detection methods with the PCC coming much ahead of the Canberra metric.

[1]  Shu-Ching Chen,et al.  Principal Component-based Anomaly Detection Scheme , 2006, Foundations and Novel Approaches in Data Mining.

[2]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[3]  H. Javitz,et al.  Detecting Unusual Program Behavior Using the Statistical Component of the Next-generation Intrusion Detection Expert System ( NIDES ) 1 , 1997 .

[4]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[5]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[6]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[7]  J. P. Ed,et al.  Transmission control protocol- darpa internet program protocol specification , 1981 .

[8]  Sameer Singh,et al.  Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..

[9]  Salvatore J. Stolfo,et al.  A framework for constructing features and models for intrusion detection systems , 2000, TSEC.

[10]  Philip K. Chan,et al.  Learning nonstationary models of normal network traffic for detecting novel attacks , 2002, KDD.

[11]  Stuart Staniford-Chen,et al.  Practical Automated Detection of Stealthy Portscans , 2002, J. Comput. Secur..

[12]  M. Shyu,et al.  A Novel Anomaly Detection Scheme Based on Principal Component Classifier , 2003 .

[13]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[14]  Sushil Jajodia,et al.  Detecting Novel Network Intrusions Using Bayes Estimators , 2001, SDM.

[15]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[16]  G Hilmi,et al.  HIERARCHICAL SELF ORGANIZING MAP BASED IDS ON KDD BENCHMARK , 2003 .

[17]  Naiwen Ye,et al.  Robustness of Canberra Metric in Computer Intrusion Detection W , 2001 .

[18]  Khaled Labib,et al.  NSOM: A Real-Time Network-Based Intrusion Detection System Using Self-Organizing Maps , 2002 .

[19]  Gürsel Serpen,et al.  Application of Machine Learning Algorithms to KDD Intrusion Detection Dataset within Misuse Detection Context , 2003, MLMTA.

[20]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[21]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[22]  Lawrence O. Hall,et al.  Knowledge based (re-)clustering , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[23]  Jonatan Gómez,et al.  Evolving Fuzzy Classifiers for Intrusion Detection , 2002 .