Mining Closed Strict Episodes

Discovering patterns in a sequence is an important aspect of data mining. One popular choice of such patterns are episodes, patterns in sequential data describing events that often occur in the vicinity of each other. Episodes also enforce in which order events are allowed to occur. In this work we introduce a technique for discovering closed episodes. Adopting existing approaches for discovering traditional patterns, such as closed item sets, to episodes is not straightforward. First of all, we cannot define a unique closure based on frequency because an episode may have several closed super episodes. Moreover, to define a closedness concept for episodes we need a subset relationship between episodes, which is not trivial to define. We approach these problems by introducing strict episodes. We argue that this class is general enough, and at the same time we are able to define a natural subset relationship within it and use it efficiently. In order to mine closed episodes we define an auxiliary closure operator. We show that this closure satisfies the needed Galois connection so that we can use the existing framework for mining closed patterns. Discovering the true closed episodes can be done as a post-processing step. We combine these observations into an efficient mining algorithm and demonstrate empirically its performance in practice.

[1]  Céline Robardet,et al.  A New Constraint for Mining Sets in Sequences , 2009, SDM.

[2]  Jiawei Han,et al.  TSP: mining top-K closed sequential patterns , 2003, Third IEEE International Conference on Data Mining.

[3]  P. S. Sastry,et al.  A fast algorithm for finding frequent episodes in event streams , 2007, KDD '07.

[4]  Toon Calders,et al.  Mining Frequent Itemsets in a Stream , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[5]  Kaizhong Zhang,et al.  Combinatorial pattern discovery for scientific data: some preliminary results , 1994, SIGMOD '94.

[6]  Christophe Rigotti,et al.  Constraint-Based Mining of Episode Rules and Optimal Window Sizes , 2004, PKDD.

[7]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[8]  Gemma C. Garriga,et al.  Summarizing Sequential Data with Closed Partial Orders , 2005, SDM.

[9]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[10]  Philip S. Yu,et al.  Discovering Frequent Closed Partial Orders from Strings , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  Gemma Casas-Garriga Discovering Unbounded Episodes in Sequential Data , 2003 .

[12]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[13]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[14]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[15]  Nikolaj Tatti Significance of Episodes Based on Minimal Windows , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[16]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[17]  Kyuseok Shim,et al.  Mining Sequential Patterns with Regular Expression Constraints , 2002, IEEE Trans. Knowl. Data Eng..

[18]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[19]  Mikhail J. Atallah,et al.  Reliable detection of episodes in event sequences , 2004, Knowledge and Information Systems.

[20]  A. Akhmetova Discovery of Frequent Episodes in Event Sequences , 2006 .

[21]  Hongyan Liu,et al.  Mining Closed Episodes from Event Sequences Efficiently , 2010, PAKDD.

[22]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[23]  Mikhail J. Atallah,et al.  Markov Models for Identification of Significant Episodes , 2005, SDM.

[24]  P. S. Sastry,et al.  A survey of temporal data mining , 2006 .