Constrained frequent itemset mining from uncertain data streams

Frequent itemset mining is a common data mining task for many real-life applications. The mined frequent itemsets can be served as building blocks for various patterns including association rules and frequent sequences. Many existing algorithms mine for frequent itemsets from traditional static transaction databases, in which the contents of each transaction (namely, items) are definitely known and precise. However, there are many situations in which ones are uncertain about the contents of transactions. This calls for the mining of uncertain data. Moreover, there are also situations in which users are interested in only some portions of the mined frequent itemsets (i.e., itemsets satisfying user-specified constraints, which express the user interest). This leads to constrained mining. Furthermore, due to advances in technology, a flood of data can be produced in many situations. This calls for the mining of data streams. To deal with all these situations, we propose tree-based algorithms to efficiently mine streams of uncertain data for frequent itemsets that satisfy user-specified constraints.

[1]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[2]  Feifei Li,et al.  Finding frequent items in probabilistic data , 2008, SIGMOD Conference.

[3]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[4]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[5]  Divesh Srivastava,et al.  Finding hierarchical heavy hitters in streaming data , 2008, TKDD.

[6]  Dan Olteanu,et al.  MayBMS: a probabilistic database management system , 2009, SIGMOD Conference.

[7]  Jeffrey Xu Yu,et al.  Sliding-window top-k queries on uncertain streams , 2008, The VLDB Journal.

[8]  Laks V. S. Lakshmanan,et al.  Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[9]  Carson Kai-Sang Leung,et al.  Mining of Frequent Itemsets from Streams of Uncertain Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[10]  Charu C. Aggarwal,et al.  Frequent pattern mining with uncertain data , 2009, KDD.

[11]  Laks V. S. Lakshmanan,et al.  Pushing Convertible Constraints in Frequent Itemset Mining , 2004, Data Mining and Knowledge Discovery.

[12]  Hans-Peter Kriegel,et al.  Probabilistic frequent itemset mining in uncertain databases , 2009, KDD.

[13]  Yufei Tao,et al.  Probabilistic Spatial Queries on Existentially Uncertain Data , 2005, SSTD.

[14]  Chi-Yin Chow,et al.  Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[15]  Carson Kai-Sang Leung Frequent Itemset Mining with Constraints , 2009, Encyclopedia of Database Systems.

[16]  Jennifer Widom,et al.  Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[17]  Carson Kai-Sang Leung,et al.  Efficient algorithms for mining constrained frequent patterns from uncertain data , 2009, U '09.

[18]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[19]  Edward Hung,et al.  Mining Frequent Itemsets from Uncertain Data , 2007, PAKDD.

[20]  Christopher Ré,et al.  Event queries on correlated probabilistic streams , 2008, SIGMOD Conference.

[21]  Heikki Mannila,et al.  OSSM: a segmentation approach to optimize frequency counting , 2002, Proceedings 18th International Conference on Data Engineering.

[22]  Feifei Li,et al.  Small synopses for group-by query verification on outsourced data streams , 2009, TODS.

[23]  Carson Kai-Sang Leung,et al.  A Tree-Based Approach for Frequent Pattern Mining from Uncertain Data , 2008, PAKDD.

[24]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[25]  Laks V. S. Lakshmanan,et al.  Efficient dynamic mining of constrained frequent sets , 2003, TODS.

[26]  Laks V. S. Lakshmanan,et al.  Efficient mining of constrained correlated sets , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[27]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[28]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[29]  Carson Kai-Sang Leung,et al.  DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams , 2006, Sixth International Conference on Data Mining (ICDM'06).

[30]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.