Fast Mining of Non-derivable Episode Rules in Complex Sequences

Researchers have been endeavoring to discover concise sets of episode rules instead of complete sets in sequences. Existing approaches, however, are not able to process complex sequences and can not guarantee the accuracy of resulting sets due to the violation of anti-monotonicity of the frequency metric. In some real applications, episode rules need to be extracted from complex sequences in which multiple items may appear in a time slot. This paper investigates the discovery of concise episode rules in complex sequences. We define a concise representation called nonderivable episode rules and formularize the mining problem. Adopting a novel anti-monotonic frequency metric, we then develop a fast approach to discover non-derivable episode rules in complex sequences. Experimental results demonstrate that the utility of the proposed approach substantially reduces the number of rules and achieves fast processing.

[1]  Min Gan,et al.  A Study on the Accuracy of Frequency Measures and Its Impact on Knowledge Discovery in Single Sequences , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[2]  Koji Iwanuma,et al.  Extracting frequent subsequences from a single long data sequence a novel anti-monotonic measure and a simple on-line algorithm , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[3]  Chia-Hui Chang,et al.  Efficient mining of frequent episodes from complex sequences , 2008, Inf. Syst..

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[5]  Heikki Mannila,et al.  Discovering Generalized Episodes Using Minimal Occurrences , 1996, KDD.

[6]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[7]  Jiawei Han,et al.  TSP: Mining top-k closed sequential patterns , 2004, Knowledge and Information Systems.

[8]  Honghua Dai,et al.  Obtaining accurate frequencies of sequential patterns over a single sequence , 2011 .

[9]  Philip S. Yu,et al.  Efficiently mining frequent closed partial orders , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[10]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[11]  Jiawei Han,et al.  Frequent Closed Sequence Mining without Candidate Maintenance , 2007, IEEE Transactions on Knowledge and Data Engineering.

[12]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[13]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[14]  Hongyan Liu,et al.  Mining Closed Episodes from Event Sequences Efficiently , 2010, PAKDD.

[15]  P. S. Sastry,et al.  A fast algorithm for finding frequent episodes in event streams , 2007, KDD '07.

[16]  Jitender S. Deogun,et al.  Discovering representative episodal association rules from event sequences using frequent closed episode sets and event constraints , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[17]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[18]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.