Efficient Episode Mining of Dynamic Event Streams

Discovering frequent episodes over event sequences is an important data mining problem. Existing methods typically require multiple passes over the data, rendering them unsuitable for streaming contexts. We present the first streaming algorithm for mining frequent episodes over a window of recent events in the stream. We derive approximation guarantees for our algorithm in terms of: (i) the separation of frequent episodes from infrequent ones, and (ii) the rate of change of stream characteristics. Our parameterization of the problem provides a new sweet spot in the tradeoff between making distributional assumptions over the stream and algorithmic efficiencies of mining. We illustrate how this yields significant benefits when mining practical streams from neuroscience and telecommunications logs.

[1]  Toon Calders,et al.  Online Discovery of Top-k Similar Motifs in Time Series Data , 2011, SDM.

[2]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[3]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[4]  Ruoming Jin,et al.  An algorithm for in-core frequent itemset mining on streaming data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[5]  Yonatan Aumann,et al.  Borders: An Efficient Algorithm for Association Generation in Dynamic Databases , 1999, Journal of Intelligent Information Systems.

[6]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[7]  Wilfred Ng,et al.  A survey on algorithms for mining frequent itemsets over data streams , 2008, Knowledge and Information Systems.

[8]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[9]  Steve M. Potter,et al.  An extremely rich repertoire of bursting patterns during the development of cortical cultures , 2006, BMC Neuroscience.

[10]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.

[11]  Raymond Chi-Wing Wong,et al.  Mining top-K frequent itemsets from data streams , 2006, Data Mining and Knowledge Discovery.

[12]  Jiawei Han,et al.  Stream Sequential Pattern Mining with Precise Error Bounds , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[13]  Naren Ramakrishnan,et al.  Discovering Excitatory Networks from Discrete Event Streams with Applications to Neuronal Spike Train Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[14]  Naren Ramakrishnan,et al.  Streaming Algorithms for Pattern Discovery over Dynamically Changing Event Sequences , 2012, ArXiv.

[15]  Raajay Viswanathan,et al.  Discovering injective episodes with general partial orders , 2011, Data Mining and Knowledge Discovery.

[16]  A. Akhmetova Discovery of Frequent Episodes in Event Sequences , 2006 .

[17]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[18]  Jiawei Han,et al.  TFP: an efficient algorithm for mining top-k frequent closed itemsets , 2005, IEEE Transactions on Knowledge and Data Engineering.

[19]  Won Suk Lee,et al.  A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams , 2004, J. Inf. Sci. Eng..