A Parallel Algorithm of Mining Frequent Pattern on Uncertain Data Streams

At present, more and more data are generated every day and the actual application requirements for the mining algorithm efficiency have become higher. In such a situation, one of the hot research topics on the frequent pattern mining over uncertain data is the spatiotemporal efficiency improvement of mining algorithms. Aiming at solving the frequent pattern mining problems over dynamic uncertain data streams, based on the existing algorithm researches, the paper proposes a parallel mining approximation algorithm based on the MapReduce framework by combining a highly efficient algorithm for static data. If this algorithm is used to mine frequent patterns, all the frequent patterns can be mined from a sliding window by using MapReduce at most twice. In the experiments conducted for this paper, in most cases the frequent item set was accurately discovered after MapReduce is used once. The experiments have shown that the spatiotemporal efficiency of the algorithm proposed in this paper is much better than those of the other algorithms.

[1]  Tzung-Pei Hong,et al.  A new mining approach for uncertain databases using CUFP trees , 2012, Expert Syst. Appl..

[2]  Toon Calders,et al.  Approximation of Frequentness Probability of Itemsets in Uncertain Data , 2010, 2010 IEEE International Conference on Data Mining.

[3]  Eli Upfal,et al.  PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce , 2012, CIKM.

[4]  Edward Y. Chang,et al.  Pfp: parallel fp-growth for query recommendation , 2008, RecSys '08.

[5]  Carson Kai-Sang Leung,et al.  Frequent itemset mining of uncertain data streams using the damped window model , 2011, SAC.

[6]  Carson Kai-Sang Leung,et al.  Frequent Pattern Mining from Time-Fading Streams of Uncertain Data , 2011, DaWaK.

[7]  Aleksandra Slavkovic,et al.  "Secure" Logistic Regression of Horizontally and Vertically Partitioned Distributed Databases , 2007 .

[8]  Lin Feng,et al.  AT-Mine: An Efficient Algorithm of Frequent Itemset Mining on Uncertain Dataset , 2013, J. Comput..

[9]  Liming Liu,et al.  An Approximation Algorithm Of Mining Frequent Itemsets From Uncertain Dataset , 2012 .

[10]  Philip S. Yu,et al.  A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[11]  Charu C. Aggarwal,et al.  Frequent pattern mining with uncertain data , 2009, KDD.

[12]  Reynold Cheng,et al.  Efficient Mining of Frequent Item Sets on Large Uncertain Databases , 2012, IEEE Transactions on Knowledge and Data Engineering.

[13]  Carson Kai-Sang Leung,et al.  Mining of Frequent Itemsets from Streams of Uncertain Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[14]  Roger Champagne,et al.  Adaptation of Apriori to MapReduce to Build a Warehouse of Relations between Named Entities across the Web , 2010, 2010 Second International Conference on Advances in Databases, Knowledge, and Data Applications.

[15]  Carson Kai-Sang Leung,et al.  Efficient Mining of Frequent Patterns from Uncertain Data , 2007 .

[16]  Carson Kai-Sang Leung,et al.  A Tree-Based Approach for Frequent Pattern Mining from Uncertain Data , 2008, PAKDD.

[17]  Zhen Liu,et al.  MapReduce as a programming model for association rules algorithm on Hadoop , 2010, The 3rd International Conference on Information Sciences and Interaction Sciences.

[18]  Chunfeng Yuan,et al.  PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets , 2011, 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming.

[19]  Ying-Ho Liu,et al.  Mining frequent patterns from univariate uncertain data , 2012, Data Knowl. Eng..