Dynamic model selection with its applications to computer security

In recent years there has been increased interest in detecting anomalies in network traffic data/audit logs for computer security. With the appearance of a masquerader, for example, any new anomalous behavior pattern may be observed in command line data, and it is an important issue to detect the emergence of such a pattern as early as possible. This paper addresses this issue of anomaly detection by dynamically selecting statistical models from data. Our goal is here not to select a single model over the data as in conventional statistical model selection, but to select a time series of optimal models efficiently, assuming that the true model may change over time. We call this approach dynamic model selection. We first propose a coding-theoretic criterion for dynamic model selection. Next, we propose two dynamic model selection algorithms attaining the minimum of the criteria and analyze their performance. Finally we demonstrate the validity of our algorithms through real application to masquerade detection using UNIX command sequences.

[1]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.

[2]  Padhraic Smyth,et al.  Markov monitoring with unknown states , 1994, IEEE J. Sel. Areas Commun..

[3]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[4]  A. P. Dawid,et al.  Present position and potential developments: some personal views , 1984 .

[5]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[6]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[7]  David M. Rocke Robustness properties of S-estimators of multivariate location and shape in high dimension , 1996 .

[8]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[9]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[10]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[11]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[12]  Kenji Yamanishi,et al.  A unifying framework for detecting outliers and change points from non-stationary time series data , 2002, KDD.

[13]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.

[14]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[15]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[16]  Graham J. Williams,et al.  On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms , 2000, KDD '00.

[17]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[18]  Masa-aki Sato,et al.  Online Model Selection Based on the Variational Bayes , 2001, Neural Computation.

[19]  Konstantinos Kalpakis,et al.  Adaptive Methods for Activity Monitoring of Streaming Data , 2002, ICMLA.

[20]  A. Karr,et al.  Computer Intrusion: Detecting Masquerades , 2001 .

[21]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[22]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.