A Practice Probability Frequent Pattern Mining Method over Transactional Uncertain Data Streams

In recent years, large amounts of uncertain data are emerged with the widespread employment of the new technologies, such as wireless sensor networks, RFID and privacy protection. According to the features of the uncertain data streams such as incomplete, full of noisy, non-uniform and mutable, this paper presents a probability frequent pattern tree called PFP-tree and a method called PFP-growth, to mine probability frequent patterns based on probability damped windows. The main characteristics of the suggested method include: (1) adopting time-based probability damped window model to enhance the accuracy of mined frequent patterns; (2) setting an item index table and a transaction index table to speed up retrieval on the PFP-tree; and (3) pruning the tree to remove the items that cannot become frequent patterns;. The experimental results demonstrate that PFP-growth method has better performance than the main existing schemes in terms of accuracy, processing time and storage space.

[1]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[2]  Feifei Li,et al.  Finding frequent items in probabilistic data , 2008, SIGMOD Conference.

[3]  Zhou Ao,et al.  A Survey on the Management of Uncertain Data , 2009 .

[4]  Anne Rogers,et al.  Hancock: A language for analyzing transactional data streams , 2004, TOPL.

[5]  Carson Kai-Sang Leung,et al.  Efficient Mining of Frequent Patterns from Uncertain Data , 2007 .

[6]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[7]  Charu C. Aggarwal,et al.  On High Dimensional Projected Clustering of Uncertain Data Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[8]  Charu C. Aggarwal,et al.  Frequent pattern mining with uncertain data , 2009, KDD.

[9]  Carson Kai-Sang Leung,et al.  A Tree-Based Approach for Frequent Pattern Mining from Uncertain Data , 2008, PAKDD.

[10]  Chen Zhang,et al.  Tracking High Quality Clusters over Uncertain Data Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[11]  Carson Kai-Sang Leung,et al.  Mining of Frequent Itemsets from Streams of Uncertain Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[12]  Aleksandra Slavkovic,et al.  "Secure" Logistic Regression of Horizontally and Vertically Partitioned Distributed Databases , 2007 .

[13]  Edward Hung,et al.  Mining Frequent Itemsets from Uncertain Data , 2007, PAKDD.

[14]  Jin Che Clustering Algorithm over Uncertain Data Streams , 2010 .

[15]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[16]  Philip S. Yu,et al.  A Framework for Clustering Uncertain Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[17]  Carson Kai-Sang Leung,et al.  Efficient algorithms for mining constrained frequent patterns from uncertain data , 2009, U '09.

[18]  Ben Kao,et al.  A Decremental Approach for Mining Frequent Itemsets from Uncertain Data , 2008, PAKDD.