Detecting and monitoring abrupt emergences and submergences of episodes over data streams

Existing studies on episode mining mainly concentrate on the discovery of (global) frequent episodes in sequences. However, frequent episodes are not suited for data streams because they do not capture the dynamic nature of the streams. This paper focuses on detecting dynamic changes in frequencies of episodes over time-evolving streams. We propose an efficient method for the online detection of abrupt emerging episodes and abrupt submerging episodes over streams. Experimental results on synthetic data show that the proposed method can effectively detect the defined patterns and meet the strict requirements of stream processing, such as one-pass, real-time update and return of results, plus limited time and space consumption. Experimental results on real data demonstrate that the patterns detected by our method are natural and meaningful. The proposed method has wide applications in stream monitoring and analysis as the discovered patterns indicate dynamic emergences/disappearances of noteworthy events/phenomena hidden in the streams.

[1]  Alfredo Cuzzocrea,et al.  A Grid Framework for Approximate Aggregate Query Answering on Summarized Sensor Network Readings , 2004, OTM Workshops.

[2]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[3]  Heikki Mannila,et al.  Discovering Generalized Episodes Using Minimal Occurrences , 1996, KDD.

[4]  J. Bailey,et al.  Efficient Mining of Contrast Patterns and Their Applications to Classification , 2005, 2005 3rd International Conference on Intelligent Sensing and Information Processing.

[5]  Philip S. Yu,et al.  Detection and Classification of Changes in Evolving Data Streams , 2006, Int. J. Inf. Technol. Decis. Mak..

[6]  Hongyan Liu,et al.  Mining Closed Episodes from Event Sequences Efficiently , 2010, PAKDD.

[7]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[8]  P. S. Sastry,et al.  A fast algorithm for finding frequent episodes in event streams , 2007, KDD '07.

[9]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[10]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[11]  Min Gan,et al.  A Study on the Accuracy of Frequency Measures and Its Impact on Knowledge Discovery in Single Sequences , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[12]  Alfredo Cuzzocrea Retrieving Accurate Estimates to OLAP Queries over Uncertain and Imprecise Multidimensional Data Streams , 2011, SSDBM.

[13]  Koji Iwanuma,et al.  Extracting frequent subsequences from a single long data sequence a novel anti-monotonic measure and a simple on-line algorithm , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[14]  Yixin Chen,et al.  Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams , 2005, Distributed and Parallel Databases.

[15]  Jianyong Wang,et al.  Efficient Mining of Minimal Distinguishing Subgraph Patterns from Graph Databases , 2008, PAKDD.

[16]  Jitender S. Deogun,et al.  Sequential Association Rule Mining with Time Lags , 2004, Journal of Intelligent Information Systems.

[17]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[18]  Charu C. Aggarwal,et al.  A framework for diagnosing changes in evolving data streams , 2003, SIGMOD '03.

[19]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[20]  Chia-Hui Chang,et al.  Efficient mining of frequent episodes from complex sequences , 2008, Inf. Syst..

[21]  Gemma Casas-Garriga Discovering Unbounded Episodes in Sequential Data , 2003 .

[22]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[23]  Alfredo Cuzzocrea,et al.  CAMS: OLAPing Multidimensional Data Streams Efficiently , 2009, DaWaK.

[24]  Christophe Rigotti,et al.  Constraint-Based Mining of Episode Rules and Optimal Window Sizes , 2004, PKDD.