Stream prediction using a generative model based on frequent episodes in event sequences

This paper presents a new algorithm for sequence prediction over long categorical event streams. The input to the algorithm is a set of target event types whose occurrences we wish to predict. The algorithm examines windows of events that precede occurrences of the target event types in historical data. The set of significant frequent episodes associated with each target event type is obtained based on formal connections between frequent episodes and Hidden Markov Models (HMMs). Each significant episode is associated with a specialized HMM, and a mixture of such HMMs is estimated for every target event type. The likelihoods of the current window of events, under these mixture models, are used to predict future occurrences of target events in the data. The only user-defined model parameter in the algorithm is the length of the windows of events used during model estimation. We first evaluate the algorithm on synthetic data that was generated by embedding (in varying levels of noise) patterns which are preselected to characterize occurrences of target events. We then present an application of the algorithm for predicting targeted user-behaviors from large volumes of anonymous search session interaction logs from a commercially-deployed web browser tool-bar.

[1]  Thomas G. Dietterich,et al.  Discovering Patterns in Sequences of Events , 1985, Artif. Intell..

[2]  Biing-Hwang Juang,et al.  Generalized mixture of HMMs for continuous speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[4]  Haym Hirsh,et al.  Learning to Predict Rare Events in Event Sequences , 1998, KDD.

[5]  Tom Heskes,et al.  Automatic Categorization of Web Pages and User Clustering with Mixtures of Hidden Markov Models , 2002, WEBKDD.

[6]  Ricardo Vilalta,et al.  Predicting rare events in temporal domains , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[8]  Rahul Telang,et al.  Competition between Internet search engines , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[9]  P. S. Sastry,et al.  Discovering frequent episodes and learning hidden Markov models: a formal connection , 2005, IEEE Transactions on Knowledge and Data Engineering.

[10]  Yun-Fang Juan,et al.  An analysis of search engine switching behavior using click streams , 2005, WWW '05.

[11]  Doug Downey,et al.  Models of Searching and Browsing: Languages, Studies, and Application , 2007, IJCAI.

[12]  P. S. Sastry,et al.  A fast algorithm for finding frequent episodes in event streams , 2007, KDD '07.

[13]  Ramarathnam Venkatesan,et al.  Connections between Mining Frequent Itemsets and Learning Generative Models , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[14]  Ryen W. White,et al.  Defection detection: predicting search engine switching , 2008, WWW.