Pattern-growth based frequent serial episode discovery

Frequent episode discovery is a popular framework for pattern discovery from sequential data. It has found many applications in domains like alarm management in telecommunication networks, fault analysis in the manufacturing plants, predicting user behavior in web click streams and so on. In this paper, we address the discovery of serial episodes. In the episodes context, there have been multiple ways to quantify the frequency of an episode. Most of the current algorithms for episode discovery under various frequencies are apriori-based level-wise methods. These methods essentially perform a breadth-first search of the pattern space. However currently there are no depth-first based methods of pattern discovery in the frequent episode framework under many of the frequency definitions. In this paper, we try to bridge this gap. We provide new depth-first based algorithms for serial episode discovery under non-overlapped and total frequencies. Under non-overlapped frequency, we present algorithms that can take care of span constraint and gap constraint on episode occurrences. Under total frequency we present an algorithm that can handle span constraint. We provide proofs of correctness for the proposed algorithms. We demonstrate the effectiveness of the proposed algorithms by extensive simulations. We also give detailed run-time comparisons with the existing apriori-based methods and illustrate scenarios under which the proposed pattern-growth algorithms perform better than their apriori counterparts.

[1]  Raajay Viswanathan,et al.  Discovering injective episodes with general partial orders , 2011, Data Mining and Knowledge Discovery.

[2]  Boris Cule,et al.  Mining closed strict episodes , 2010, Data Mining and Knowledge Discovery.

[3]  Christophe Rigotti,et al.  Constraint-Based Mining of Episode Rules and Optimal Window Sizes , 2004, PKDD.

[4]  Wilfred Ng,et al.  Mining probabilistically frequent sequential patterns in uncertain databases , 2012, EDBT '12.

[5]  Christopher D. Carothers,et al.  VOGUE: A Novel Variable Order-Gap State Machine for Modeling Sequences , 2006, PKDD.

[6]  K. Iwanuma,et al.  On anti-monotone frequency measures for extracting sequential patterns from a single very-long data sequence , 2004, IEEE Conference on Cybernetics and Intelligent Systems, 2004..

[7]  Boris Cule,et al.  Mining closed episodes with simultaneous events , 2011, KDD.

[8]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[9]  R. Kass,et al.  Multiple neural spike train data analysis: state-of-the-art and future challenges , 2004, Nature Neuroscience.

[10]  Chengqi Zhang,et al.  Mining frequent serial episodes over uncertain sequence data , 2013, EDBT '13.

[11]  Jiawei Han,et al.  Bidirectional mining of non-redundant recurrent rules from a sequence database , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[12]  Rajeev Raman,et al.  Mining sequential patterns from probabilistic databases , 2011, Knowledge and Information Systems.

[13]  Chia-Hui Chang,et al.  Efficient mining of frequent episodes from complex sequences , 2008, Inf. Syst..

[14]  Ryen W. White,et al.  Stream prediction using a generative model based on frequent episodes in event sequences , 2008, KDD.

[15]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[16]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[17]  Meng-Feng Tsai,et al.  Exploiting Frequent Episodes in Weighted Suffix Tree to Improve Intrusion Detection System , 2008, 22nd International Conference on Advanced Information Networking and Applications - Workshops (aina workshops 2008).

[18]  P. S. Sastry,et al.  Discovering frequent episodes and learning hidden Markov models: a formal connection , 2005, IEEE Transactions on Knowledge and Data Engineering.

[19]  Dimitrios Gunopulos,et al.  Episode Matching , 1997, CPM.

[20]  Debprakash Patnaik,et al.  Inferring neuronal network connectivity from spike data: A temporal data mining approach , 2008, Sci. Program..

[21]  P. S. Sastry,et al.  Conditional Probability-Based Significance Tests for Sequential Patterns in Multineuronal Spike Trains , 2008, Neural Computation.

[22]  Avinash Achar,et al.  A unified view of the apriori-based algorithms for frequent episode discovery , 2011, Knowledge and Information Systems.

[23]  Manish Marwah,et al.  Temporal data mining approaches for sustainable chiller management in data centers , 2011, TIST.

[24]  Ada Wai-Chee Fu,et al.  Mining Frequent Episodes for Relating Financial Events and Stock Trends , 2003, PAKDD.

[25]  Jiawei Han,et al.  Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[26]  Steve M. Potter,et al.  An extremely rich repertoire of bursting patterns during the development of cortical cultures , 2006, BMC Neuroscience.

[27]  Chao Liu,et al.  Efficient mining of iterative patterns for software specification discovery , 2007, KDD '07.

[28]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[29]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.