Dynamic Model Selection With its Applications to Novelty Detection

We are concerned with the issue of dynamically selecting optimal statistical models from time series. The goal is not to select a single optimal model over the data as in conventional model selection, but to select a time series of optimal models under the assumption that the data source may be nonstationary. We call this issue dynamic model selection (DMS). From the standpoint of minimum description length principle, we first propose coding-theoretic criteria for DMS. Next, we propose efficient DMS algorithms on the basis of the criteria and analyze their performance in terms of their total code lengths and computation time. Finally, we apply DMS to novelty detection and demonstrate its effectiveness through empirical results on masquerade detection using UNIX command sequences.

[1]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[2]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.

[3]  David M. Rocke Robustness properties of S-estimators of multivariate location and shape in high dimension , 1996 .

[4]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[5]  Padhraic Smyth,et al.  Markov monitoring with unknown states , 1994, IEEE J. Sel. Areas Commun..

[6]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[7]  A. Karr,et al.  Computer Intrusion: Detecting Masquerades , 2001 .

[8]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[9]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[12]  Padhraic Smyth,et al.  An Evaluation of Linearly Combining Density Estimators via Stacking , 1998 .

[13]  A. P. Dawid,et al.  Present position and potential developments: some personal views , 1984 .

[14]  Graham J. Williams,et al.  On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms , 2000, KDD '00.

[15]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[16]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[17]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[18]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[19]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[20]  Kenji Yamanishi,et al.  A unifying framework for detecting outliers and change points from non-stationary time series data , 2002, KDD.

[21]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[22]  Bertrand Clarke,et al.  Comparing Bayes Model Averaging and Stacking When Model Approximation Error Cannot be Ignored , 2003, J. Mach. Learn. Res..

[23]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[24]  D. Madigan,et al.  Correction to: ``Bayesian model averaging: a tutorial'' [Statist. Sci. 14 (1999), no. 4, 382--417; MR 2001a:62033] , 2000 .

[25]  Konstantinos Kalpakis,et al.  Adaptive Methods for Activity Monitoring of Streaming Data , 2002, ICMLA.

[26]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.

[27]  Padhraic Smyth,et al.  Linearly Combining Density Estimators via Stacking , 1999, Machine Learning.

[28]  Masa-aki Sato,et al.  Online Model Selection Based on the Variational Bayes , 2001, Neural Computation.

[29]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[30]  Graham J. Williams,et al.  On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms , 2000, KDD '00.