Hidden Markov Models for Human/Computer Interface Modeling

Automated modeling of human behaviors is useful in the computer security domain of anomaly detection. In the user modeling facet of the anomaly detection domain, the task is to develop a model or pro le of the normal working state of a computer system user and to detect anomalous conditions as deviations from expected behavior patterns. In this paper, we examine the use of hidden Markov models (HMMs) as user pro les for the anomaly detection task. We formulate a user identity classi cation system based on the posterior likelihood of the model parameters and present an approximation that allows this quantity to be quickly estimated to a high degree of accuracy for subsequences of the total sequence of observed data. We give an empirical analysis of the HMM anomaly detection sensor. We examine performance across a range of model sizes (i.e. number of hidden states). We demonstrate that, for most of our user population, a singlestate model is inferior to the multi-state models, and that, within multi-state models, those with more states tend to model the pro led user more e ectively but imposters less e ectively than do smaller models. These observations are consistent with the interpretation that larger models are necessary to capture high degrees of user behavioral complexity. We describe extensions of these techniques to other tasks and domains.

[1]  Steven W. Norton,et al.  Learning to Recognize Promoter Sequences in E. coli by Modeling Uncertainty in the Training Data , 1994, AAAI.

[2]  Zoran Obradovic,et al.  A multi-component nonlinear prediction system for the S&P 500 Index , 1996, Neurocomputing.

[3]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[4]  Hiroshi Motoda,et al.  Automated user modeling for intelligent interface , 1996, Int. J. Hum. Comput. Interact..

[5]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[7]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[8]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[9]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[10]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[11]  Alan V. Oppenheim,et al.  Discrete-time Signal Processing. Vol.2 , 2001 .

[12]  Brian D. Davison,et al.  Predicting Sequences of User Actions , 1998 .

[13]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1986, 1986 IEEE Symposium on Security and Privacy.

[14]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[15]  Kohji Fukunaga,et al.  Introduction to Statistical Pattern Recognition-Second Edition , 1990 .

[16]  Steven Salzberg,et al.  Locating Protein Coding Regions in Human DNA Using a Decision Tree Algorithm , 1995, J. Comput. Biol..

[17]  T. Lane,et al.  Sequence Matching and Learning in Anomaly Detection for Computer Security , 1997 .

[18]  Padhraic Smyth,et al.  Markov monitoring with unknown states , 1994, IEEE J. Sel. Areas Commun..

[19]  Robert E. Schapire,et al.  Inference of Finite Automata Using Homing Sequences (Extended Abstract) , 1989, STOC 1989.

[20]  Carla E. Brodley,et al.  Approaches to Online Learning and Concept Drift for User Identification in Computer Security , 1998, KDD.