Sequence Matching and Learning in Anomaly Detection for Computer Security

Two problems of importance in computer security are to 1) detect the presence of an intruder masquerading as the valid user and 2) detect the perpetration of abusive actions on the part of an otherwise innocuous user. We have developed an approach to these problems that examines sequences of user actions (UNIX commands) to classify behavior as normal or anomalous. In this paper we explore the matching function needed to compare a current behavioral sequence to a historical profile. We discuss the difficulties of performing matching in human-generated data and show that exact string matching is insufficient to this domain. We demonstrate a number of partial matching functions and examine their behavior on user command data. In particular, we explore two methods for weighting scores by adjacency of matches as well as two growth functions (polynomial and exponential) for scoring similarities. We find, empirically, that the optimal similarity measure is user dependant but that measures based on the assumption of causal linkage between user commands are superior for this domain.

[1]  Thomas G. Dietterich,et al.  Learning to Predict Sequences , 1985 .

[2]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[3]  S. E. Smaha Haystack: an intrusion detection system , 1988, [Proceedings 1988] Fourth Aerospace Computer Security Applications.

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  Biswanath Mukherjee,et al.  A network security monitor , 1990, Proceedings. 1990 IEEE Computer Society Symposium on Research in Security and Privacy.

[6]  Eugene H. Spafford,et al.  Software forensics: Can we track code to its authors? , 1993, Comput. Secur..

[7]  Robert R. Moeller,et al.  Network Security , 1993, Inf. Secur. J. A Glob. Perspect..

[8]  Haym Hirsh,et al.  Bootstrapping Training-Data Representations for Inductive Learning: A Case Study in Molecular Biology , 1994, AAAI.

[9]  Steven W. Norton,et al.  Learning to Recognize Promoter Sequences in E. coli by Modeling Uncertainty in the Training Data , 1994, AAAI.

[10]  Steven Salzberg,et al.  Locating Protein Coding Regions in Human DNA Using a Decision Tree Algorithm , 1995, J. Comput. Biol..

[11]  Raymond J. Mooney,et al.  A Novel Application of Theory Refinement to Student Modeling , 1996, AAAI/IAAI, Vol. 1.

[12]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[13]  Sandeep Kumar,et al.  Classification and detection of computer intrusions , 1996 .

[14]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[15]  Stephanie Forrest,et al.  Computer immunology , 1997, CACM.

[16]  Carla E. Brodley,et al.  Detecting the Abnormal: Machine Learning in Computer Security , 1997 .

[17]  Eugene H. Spafford,et al.  Authorship analysis: identifying the author of a program , 1997, Comput. Secur..