A Comparison of Test Statistics for Computer Intrusion Detection Based on Principal Components Regre

One method of detecting an unauthorized user masquerading as a registered user is to compare in real time the sequence of commands given by each user to a pro-le of that user's past behavior. Our prooles are derived from each user's historical one-step transition probabilities of Unix commands. We compare various statistics for testing the null hypothesis that the observed command transition probabilities come from a prooled transition matrix, which is a smoothed version of historical transition counts. The primary statistical diiculty comes from the large sparse nature of the transition count matrix. Our example is based on the 100 most frequent commands. Hence we infer about a 100 by 100 matrix of transition probabilities, although most of these transitions will be unobserved in the test data. Diierent test statistics are formed by varying the amount of smoothing of the transition probabilities in the training data, and by using diierent theoretical test criteria. To reduce the dimensionality of the test statistics , the alternative hypothesis is based on a principal component regression model. Using example data from a population of 45 (mostly research) users on a single computer, we compute error rates and ROC curves for the various test statistics when each user's test data is compared to their own and to other users' training data. We also discuss implementation issues such as storage and computational requirements.

[1]  Peter G. Neumann,et al.  EMERALD: Event Monitoring Enabling Responses to Anomalous Live Disturbances , 1997, CCS 2002.

[2]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[3]  E. B. Wilson,et al.  The Distribution of Chi-Square. , 1931, Proceedings of the National Academy of Sciences of the United States of America.