Fast Algorithms for Frequent Itemset Mining from Uncertain Data

The majority of existing data mining algorithms mine frequent item sets from precise databases. A well-known algorithm is FP-growth, which builds a compact FP-tree structure to capture important contents of the database and mines frequent item sets from the FP-tree. However, there are situations in which data are uncertain. In recent years, researchers have paid attention to frequent item set mining from uncertain databases. UFP-growth is one of the frequently cited algorithms for mining uncertain data. However, the corresponding UFP-tree structure can be large. Other tree structures for handling uncertain data may achieve compactness at the expense of looser upper bounds on expected supports. To solve this problem, we propose two compact tree structures which capture uncertain data with tighter upper bounds than existing tree structures. We also designed two algorithms that mine frequent item sets from our proposed trees. Our experimental results show the tightness of bounds to expected supports provided by these algorithms.

[1]  James Bailey,et al.  Mining Probabilistic Frequent Spatio-Temporal Sequential Patterns with Gap Constraints from Uncertain Databases , 2013, 2013 IEEE 13th International Conference on Data Mining.

[2]  Carson Kai-Sang Leung,et al.  Fast Tree-Based Mining of Frequent Itemsets from Uncertain Data , 2012, DASFAA.

[3]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[4]  Charu C. Aggarwal,et al.  Frequent pattern mining with uncertain data , 2009, KDD.

[5]  Boxiang Dong,et al.  Integrity Verification of Outsourced Frequent Itemset Mining with Deterministic Guarantee , 2013, 2013 IEEE 13th International Conference on Data Mining.

[6]  Toon Calders,et al.  Approximation of Frequentness Probability of Itemsets in Uncertain Data , 2010, 2010 IEEE International Conference on Data Mining.

[7]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[8]  Carson Kai-Sang Leung,et al.  A Tree-Based Approach for Frequent Pattern Mining from Uncertain Data , 2008, PAKDD.

[9]  Carson Kai-Sang Leung,et al.  BLIMP: A Compact Tree Structure for Uncertain Frequent Pattern Mining , 2014, DaWaK.

[10]  Jonas Poelmans,et al.  Formal Concept Analysis in Knowledge Discovery: A Survey , 2010, ICCS.

[11]  Carson Kai-Sang Leung,et al.  Uncertain Frequent Pattern Mining , 2014, Frequent Pattern Mining.

[12]  Edward Hung,et al.  Mining Frequent Itemsets from Uncertain Data , 2007, PAKDD.

[13]  Benjamin C. M. Fung,et al.  Direct Discovery of High Utility Itemsets without Candidate Generation , 2012, 2012 IEEE 12th International Conference on Data Mining.

[14]  Ruoming Jin,et al.  A Hypergraph-based Method for Discovering Semantically Associated Itemsets , 2011, 2011 IEEE 11th International Conference on Data Mining.

[15]  Philip S. Yu,et al.  Mining Frequent Itemsets over Uncertain Databases , 2012, Proc. VLDB Endow..

[16]  Carson Kai-Sang Leung,et al.  PUF-Tree: A Compact Tree Structure for Frequent Pattern Mining of Uncertain Data , 2013, PAKDD.

[17]  Toon Calders,et al.  Efficient Pattern Mining of Uncertain Data with Sampling , 2010, PAKDD.