Mining uncertain data for frequent itemsets that satisfy aggregate constraints

Many existing algorithms mine transaction databases of precise data for frequent itemsets. However, there are situations in which the user is interested in only some tiny portions of all the frequent itemsets, and there are also situations in which data in the transaction databases are uncertain. This calls for both (i) constrained mining (for finding only those frequent itemsets that satisfy user constraints, which express the user interest) and (ii) mining uncertain data. In this paper, we propose a tree-based algorithm that effectively mines transaction databases of uncertain data for only those frequent itemsets satisfying the user-specified aggregate constraints. The algorithm avoids candidate generation and pushes the aggregate constraints inside the mining process, which reduces computation and avoids unnecessary constraint checking.

[1]  Carson Kai-Sang Leung,et al.  A Tree-Based Approach for Frequent Pattern Mining from Uncertain Data , 2008, PAKDD.

[2]  Yufei Tao,et al.  Probabilistic Spatial Queries on Existentially Uncertain Data , 2005, SSTD.

[3]  Carson Kai-Sang Leung,et al.  Mining uncertain data for constrained frequent sets , 2009, IDEAS '09.

[4]  Li Xiong,et al.  Frequent pattern mining for kernel trace data , 2008, SAC '08.

[5]  Feifei Li,et al.  Finding frequent items in probabilistic data , 2008, SIGMOD Conference.

[6]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[7]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[8]  Philip S. Yu,et al.  A Framework for Clustering Uncertain Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[9]  Hans-Peter Kriegel,et al.  Probabilistic frequent itemset mining in uncertain databases , 2009, KDD.

[10]  Carson Kai-Sang Leung,et al.  Efficient algorithms for mining constrained frequent patterns from uncertain data , 2009, U '09.

[11]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[12]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[13]  Carson Kai-Sang Leung,et al.  Efficient algorithms for the mining of constrained frequent patterns from uncertain data , 2010, SKDD.

[14]  Jean-François Boulicaut,et al.  Looking for monotonicity properties of a similarity constraint on sequences , 2006, SAC '06.

[15]  Chi-Yin Chow,et al.  Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[16]  Bruno Crémilleux,et al.  Optimizing hypergraph transversal computation with an anti-monotone constraint , 2007, SAC '07.

[17]  Laks V. S. Lakshmanan,et al.  Efficient mining of constrained correlated sets , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[18]  Laks V. S. Lakshmanan,et al.  Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[19]  Charu C. Aggarwal,et al.  Frequent pattern mining with uncertain data , 2009, KDD.

[20]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[21]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[22]  Guanling Lee,et al.  Mining fault-tolerant frequent patterns efficiently with powerful pruning , 2008, SAC '08.

[23]  Laks V. S. Lakshmanan,et al.  Efficient dynamic mining of constrained frequent sets , 2003, TODS.

[24]  Salvatore Orlando,et al.  Frequent spatio-temporal patterns in trajectory data warehouses , 2009, SAC '09.

[25]  Edward Hung,et al.  Mining Frequent Itemsets from Uncertain Data , 2007, PAKDD.

[26]  Toon Calders,et al.  Mining itemsets in the presence of missing values , 2007, SAC '07.

[27]  Dirk Van Gucht,et al.  A probability analysis for candidate-based frequent itemset algorithms , 2006, SAC.

[28]  Sau Dan Lee,et al.  Decision Trees for Uncertain Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[29]  Siu-Ming Yiu,et al.  Maintenance of maximal frequent itemsets in large databases , 2007, SAC '07.

[30]  Carson Kai-Sang Leung,et al.  Mining of Frequent Itemsets from Streams of Uncertain Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[31]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.