Streaming Algorithms for Pattern Discovery over Dynamically Changing Event Sequences

Discovering frequent episodes over event sequences is an important data mining task. In many applications, events constituting the data sequence arrive as a stream, at furious rates, and recent trends (or frequent episodes) can change and drift due to the dynamical nature of the underlying event generation process. The ability to detect and track such the changing sets of frequent episodes can be valuable in many application scenarios. Current methods for frequent episode discovery are typically multipass algorithms, making them unsuitable in the streaming context. In this paper, we propose a new streaming algorithm for discovering frequent episodes over a window of recent events in the stream. Our algorithm processes events as they arrive, one batch at a time, while discovering the top frequent episodes over a window consisting of several batches in the immediate past. We derive approximation guarantees for our algorithm under the condition that frequent episodes are approximately well-separated from infrequent ones in every batch of the window. We present extensive experimental evaluations of our algorithm on both real and synthetic data. We also present empirical comparisons with baselines and adaptations of streaming algorithms from itemset mining literature.

[1]  Eamonn J. Keogh,et al.  Online discovery and maintenance of time series motifs , 2010, KDD.

[2]  Wilfred Ng,et al.  A survey on algorithms for mining frequent itemsets over data streams , 2008, Knowledge and Information Systems.

[3]  Srivatsan Laxman Discovering Frequent Episodes : Fast Algorithms, Connections With HMMs And Generalizations , 2006 .

[4]  Jiawei Han,et al.  Stream Sequential Pattern Mining with Precise Error Bounds , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[5]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[6]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[7]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[8]  Raymond Chi-Wing Wong,et al.  Mining top-K frequent itemsets from data streams , 2006, Data Mining and Knowledge Discovery.

[9]  Raajay Viswanathan,et al.  Discovering injective episodes with general partial orders , 2011, Data Mining and Knowledge Discovery.

[10]  A. Akhmetova Discovery of Frequent Episodes in Event Sequences , 2006 .

[11]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[12]  Naren Ramakrishnan,et al.  Discovering excitatory relationships using dynamic Bayesian networks , 2011, Knowledge and Information Systems.

[13]  Steve M. Potter,et al.  An extremely rich repertoire of bursting patterns during the development of cortical cultures , 2006, BMC Neuroscience.

[14]  Ruoming Jin,et al.  An algorithm for in-core frequent itemset mining on streaming data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).