A Scalable Data Analytics Algorithm for Mining Frequent Patterns from Uncertain Data

With advances in technology, massive amounts of valuable data can be collected and transmitted at high velocity in various scientific, biomedical, and engineering applications. Hence, scalable data analytics tools are in demand for analyzing these data. For example, scalable tools for association analysis help reveal frequently occurring patterns and their relationships, which in turn lead to intelligent decisions. While a majority of existing frequent pattern mining algorithms (e.g., FP-growth) handle only precise data, there are situations in which data are uncertain. In recent years, tree-based algorithms for mining uncertain data (e.g., UF-growth, UFP-growth) have been developed. However, tree structures corresponding to these algorithms can be large. Other tree structures for handling uncertain data may achieve compactness at the expense of loose upper bounds on expected supports. In this paper, we propose (i) a compact tree structure that captures uncertain data with tighter upper bounds than aforementioned tree structures and (ii) a scalable data analytics algorithm that mines frequent patterns from our tree structure. Experimental results show the tightness of bounds to expected supports provided by our algorithm.

[1]  Carson Kai-Sang Leung,et al.  Fast Tree-Based Mining of Frequent Itemsets from Uncertain Data , 2012, DASFAA.

[2]  Toon Calders,et al.  Approximation of Frequentness Probability of Itemsets in Uncertain Data , 2010, 2010 IEEE International Conference on Data Mining.

[3]  Jianyong Wang,et al.  Efficient Mining of Closed Sequential Patterns on Stream Sliding Window , 2011, 2011 IEEE 11th International Conference on Data Mining.

[4]  Michael R. Lyu,et al.  Efficient online learning for multitask feature selection , 2013, TKDD.

[5]  Dawn Xiaodong Song,et al.  Mining Permission Request Patterns from Android and Facebook Applications , 2012, 2012 IEEE 12th International Conference on Data Mining.

[6]  Naren Ramakrishnan,et al.  Efficient Episode Mining of Dynamic Event Streams , 2012, 2012 IEEE 12th International Conference on Data Mining.

[7]  Feifei Li,et al.  Finding frequent items in probabilistic data , 2008, SIGMOD Conference.

[8]  Philip S. Yu,et al.  Mining Frequent Itemsets over Uncertain Databases , 2012, Proc. VLDB Endow..

[9]  Carson Kai-Sang Leung,et al.  A Tree-Based Approach for Frequent Pattern Mining from Uncertain Data , 2008, PAKDD.

[10]  Jun Huan Frequent Graph Patterns , 2009, Encyclopedia of Database Systems.

[11]  Laks V. S. Lakshmanan,et al.  Efficient dynamic mining of constrained frequent sets , 2003, TODS.

[12]  Carson Kai-Sang Leung,et al.  Mining uncertain data , 2011, WIREs Data Mining Knowl. Discov..

[13]  Toon Calders,et al.  Efficient Pattern Mining of Uncertain Data with Sampling , 2010, PAKDD.

[14]  Carson Kai-Sang Leung,et al.  Mining of Frequent Itemsets from Streams of Uncertain Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[15]  Vilém Vychodil,et al.  Using Frequent Closed Itemsets for Data Dimensionality Reduction , 2011, 2011 IEEE 11th International Conference on Data Mining.

[16]  Charu C. Aggarwal,et al.  Frequent pattern mining with uncertain data , 2009, KDD.

[17]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[18]  Hans-Peter Kriegel,et al.  Probabilistic frequent itemset mining in uncertain databases , 2009, KDD.

[19]  Edward Hung,et al.  Mining Frequent Itemsets from Uncertain Data , 2007, PAKDD.

[20]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[21]  Carson Kai-Sang Leung,et al.  PUF-Tree: A Compact Tree Structure for Frequent Pattern Mining of Uncertain Data , 2013, PAKDD.

[22]  Chedy Raïssi,et al.  Mining Dominant Patterns in the Sky , 2011, 2011 IEEE 11th International Conference on Data Mining.

[23]  Carson Kai-Sang Leung,et al.  BigSAM: Mining Interesting Patterns from Probabilistic Databases of Uncertain Big Data , 2014, PAKDD Workshops.

[24]  Michael R. Lyu,et al.  Sparse Poisson coding for high dimensional document clustering , 2013, 2013 IEEE International Conference on Big Data.