An Efficient Load Balancing Multi-core Frequent Patterns Mining Algorithm

Mining frequent pattern from transactional database is an important problem in data mining. Many methods have been proposed to solve this problem. However, the computation time still increase significantly while the data size grows. Therefore, parallel computing is a good strategy to solve this problem. Researchers have proposed various parallel and distributed algorithms on cluster system, grid system. However, the construction and maintenance cost is pretty high. In this paper, a multi-core load balancing frequent pattern mining algorithm is presented. The main goal of the proposed algorithm is to reduce the massive duplicated candidates generated in previous method. In order to verify the performance, we also implemented the proposed algorithm as well as previous methods for comparison. The experimental results showed that our method could reduce the computation time dramatically with more threads. Moreover, we could observe that the workload was equally dispatched to each computing unit.

[1]  Ashfaq Khokhar,et al.  Frequent Pattern Mining on Message Passing Multiprocessor Systems , 2004, Distributed and Parallel Databases.

[2]  Chia-Chu Chiang,et al.  A Parallel Apriori Algorithm for Frequent Itemsets Mining , 2006, Fourth International Conference on Software Engineering Research, Management and Applications (SERA'06).

[3]  Jiayi Zhou,et al.  Load Balancing Approach Parallel Algorithm for Frequent Pattern Mining , 2007, PaCT.

[4]  Yang Cheng,et al.  The Research of Improved Apriori Algorithm for Mining Association Rules , 2007, 2007 International Conference on Service Systems and Service Management.

[5]  Jiayi Zhou,et al.  Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system , 2010, Expert Syst. Appl..

[6]  Hongjun Lu,et al.  Ascending frequency ordered prefix-tree: efficient mining of frequent patterns , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[7]  Bowen Chen,et al.  The research of improved apriori algorithm for mining association rules , 2008 .

[8]  Tzung-Pei Hong,et al.  A load-balanced distributed parallel mining algorithm , 2010, Expert Syst. Appl..

[9]  Yan Zhang,et al.  AVI: Based on the vertical and intersection operation of the improved Apriori algorithm , 2010, 2010 2nd International Conference on Future Computer and Communication.

[10]  Ferenc Bodon,et al.  A fast APRIORI implementation , 2003, FIMI.

[11]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[12]  Chuan Yi Tang,et al.  Efficient parallel branch-and-bound algorithm for constructing minimum ultrametric trees , 2009, J. Parallel Distributed Comput..

[13]  P. Balasubramanie,et al.  An Efficient Algorithm for Mining Maximal Frequent Item Sets , 2008 .

[14]  Jiayi Zhou,et al.  Parallel frequent patterns mining algorithm on GPU , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[15]  Georg Hager,et al.  Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[16]  Jiayi Zhou,et al.  Tidset-Based Parallel FP-tree Algorithm for the Frequent Pattern Mining Problem on PC Clusters , 2008, GPC.

[17]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[18]  Gagan Agrawal,et al.  Performance Issues in Parallelizing Data-Intensive Applications on a Multi-core Cluster , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.