论文信息 - Pfp: parallel fp-growth for query recommendation

Pfp: parallel fp-growth for query recommendation

Frequent itemset mining (FIM) is a useful tool for discovering frequently co-occurrent items. Since its inception, a number of significant FIM algorithms have been developed to speed up mining performance. Unfortunately, when the dataset size is huge, both the memory use and computational cost can still be prohibitively expensive. In this work, we propose to parallelize the FP-Growth algorithm (we call our parallel algorithm PFP) on distributed machines. PFP partitions computation in such a way that each machine executes an independent group of mining tasks. Such partitioning eliminates computational dependencies between machines, and thereby communication between them. Through empirical study on a large dataset of 802,939 Web pages and 1,021,107 tags, we demonstrate that PFP can achieve virtually linear speedup. Besides scalability, the empirical study demonstrates that PFP to be promising for supporting query recommendation for search engines.

[1] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[2] Shirish Tatikonda,et al. Toward terabyte pattern mining: an architecture-conscious solution , 2007, PPoPP.

[3] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[4] Albert,et al. Emergence of scaling in random networks , 1999, Science.

[5] Osmar R. Zaïane,et al. Parallel leap: large-scale maximal pattern mining in a distributed environment , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).

[6] Masaru Kitsuregawa,et al. Parallel FP-Growth on PC Cluster , 2003, PAKDD.

[7] Rakesh Agarwal,et al. Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[8] Nhien-An Le-Khac,et al. Distributed Frequent Itemsets Mining in Heterogeneous Platforms , 2007 .

[9] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10] Ramakrishnan Srikant,et al. Fast algorithms for mining association rules , 1998, VLDB 1998.

[11] Eric Li,et al. Optimization of Frequent Itemset Mining on Multiple-Core Processor , 2007, VLDB.

[12] Osmar R. Zaïane,et al. Fast parallel association rule mining without candidacy generation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.