A Geometric Framework for Unsupervised Anomaly Detection

Most current intrusion detection systems employ signature-based methods or data mining-based methods which rely on labeled training data. This training data is typically expensive to produce. We present a new geometric framework for unsupervised anomaly detection, which are algorithms that are designed to process unlabeled data. In our framework, data elements are mapped to a feature space which is typically a vector space ℛd. Anomalies are detected by determining which points lies in sparse regions of the feature space. We present two feature maps for mapping data elements to a feature space. Our first map is a data-dependent normalization feature map which we apply to network connections. Our second feature map is a spectrum kernel which we apply to system call traces. We present three algorithms for detecting which points lie in sparse regions of the feature space. We evaluate our methods by performing experiments over network records from the KDD CUP 1999 data set and system call traces from the 1999 Lincoln Labs DARPA evaluation.

[1]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[2]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[3]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[4]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[5]  Paul Helman,et al.  A statistically based system for prioritizing information exploration under uncertainty , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[6]  Anup K. Ghosh,et al.  A Study in Using Neural Networks for Anomaly and Misuse Detection , 1999, USENIX Security Symposium.

[7]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[8]  Philip K. Chan,et al.  Learning Patterns from Unix Process Execution Traces for Intrusion Detection , 1997 .

[9]  Stephanie Forrest,et al.  Intrusion Detection Using Sequences of System Calls , 1998, J. Comput. Secur..

[10]  Nong Ye,et al.  A Markov Chain Model of Temporal Behavior for Anomaly Detection , 2000 .

[11]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[12]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[13]  Salvatore J. Stolfo,et al.  Detecting sound events in basketball video archive , 2001 .

[14]  Eleazar Eskin,et al.  Anomaly Detection over Noisy Data using Learned Probability Distributions , 2000, ICML.

[15]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[16]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[17]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[18]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[19]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[20]  Salvatore J. Stolfo,et al.  Data Mining Approaches for Intrusion Detection , 1998, USENIX Security Symposium.

[21]  Salvatore J. Stolfo,et al.  Modeling system calls for intrusion detection with dynamic window sizes , 2001, Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01.

[22]  T. Lane,et al.  Sequence Matching and Learning in Anomaly Detection for Computer Security , 1997 .

[23]  Harold S. Javitz,et al.  The NIDES Statistical Component Description and Justification , 1994 .

[25]  Bernhard Schölkopf,et al.  Dynamic Alignment Kernels , 2000 .

[26]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[27]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[28]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[29]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[30]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.