Mining Dependent Frequent Serial Episodes from Uncertain Sequence Data

In this paper, we focus on the problem of mining Probabilistic Dependent Frequent Serial Episodes (P-DFSEs) from uncertain sequence data. By observing that the frequentness probability of an episode in an uncertain sequence is a Markov Chain imbeddable variable, we first propose an Embeded Markov Chain-based algorithm that efficiently computes the frequentness probability of an episode by projecting the probability space into a set of limited partitions. To further improve the computation efficiency, we devise an optimized approach that prunes candidate episodes early by estimating the upper bound of their frequentness probabilities.

[1]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[2]  A. Akhmetova Discovery of Frequent Episodes in Event Sequences , 2006 .

[3]  Chengqi Zhang,et al.  Mining frequent serial episodes over uncertain sequence data , 2013, EDBT '13.

[4]  Srivatsan Laxman Discovering Frequent Episodes : Fast Algorithms, Connections With HMMs And Generalizations , 2006 .

[5]  P. S. Sastry,et al.  Discovering frequent episodes and learning hidden Markov models: a formal connection , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6]  Markos V. Koutras,et al.  Runs, scans and URN model distributions: A unified Markov chain approach , 1995 .

[7]  P. S. Sastry,et al.  A fast algorithm for finding frequent episodes in event streams , 2007, KDD '07.

[8]  Reynold Cheng,et al.  Mining uncertain data with probabilistic guarantees , 2010, KDD.

[9]  K. Iwanuma,et al.  On anti-monotone frequency measures for extracting sequential patterns from a single very-long data sequence , 2004, IEEE Conference on Cybernetics and Intelligent Systems, 2004..

[10]  Avinash Achar,et al.  A unified view of the apriori-based algorithms for frequent episode discovery , 2011, Knowledge and Information Systems.

[11]  Hans-Peter Kriegel,et al.  Probabilistic frequent itemset mining in uncertain databases , 2009, KDD.

[12]  Nikolaj Tatti Significance of Episodes Based on Minimal Windows , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[13]  Gemma Casas-Garriga Discovering Unbounded Episodes in Sequential Data , 2003 .

[14]  Xianmang He,et al.  Efficient Episode Mining with Minimal and Non-overlapping Occurrences , 2010, 2010 IEEE International Conference on Data Mining.

[15]  George Karypis,et al.  A Universal Formulation of Sequential Patterns , 1999 .