Dynamic Load Balancing for Parallel Association Rule Mining on Heterogenous PC Cluster Systems

The dynamic load balancing strategies for parallel association rule mining are proposed under heterogeneous PC cluster environment. PC cluster is recently regarded as one of the most promising platforms for heavy data intensive applications, such as decision support query processing and data mining. The development period of PC hardware is becoming extremely short, which results in heterogeneous system, where the clock cycle of CPU, the performance/capacity of disk drives, etc are di erent among component PC's. Heterogeneity is inevitable. Basically, current algorithms assume the homogeneity. Thus if we naively apply them to heterogeneous system, its performance is far below expectation. We need some new methodologies to handle heterogeneity. In this paper, we propose the new dynamic load balancing methods for association rule mining, which works under heterogeneous system. Two strategies, called candidate migration and transaction migration are proposed. Initially rst one is invoked. When the load imbalance cannot be resolved with the rst method, the second one is employed, which is costly but more e ective for strong imbalance. We have implemented them on the PC cluster system with two di erent types of PCs: one with Pentium Pro, the other one with Pentium II. The experimental results Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 25th VLDB Conference, Edinburgh, Scotland, 1999. con rm that the proposed approach can very e ectively balance the workload among heterogeneous PCs.

[1]  Srinivasan Parthasarathy,et al.  Memory Placement Techniques for Parallel Association Mining , 1998, KDD.

[2]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[3]  Masato Oguchi,et al.  Parallel Database Processing/Data Mining an Large Scale ATM Connected PC Cluster , 1997, Euro-PDS.

[4]  David Wai-Lok Cheung,et al.  Effect of Data Skewness in Parallel Mining of Association Rules , 1998, PAKDD.

[5]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[6]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[7]  Michael Stonebraker,et al.  The Asilomar report on database research , 1998, SGMD.

[8]  Philip S. Yu,et al.  Efficient parallel data mining for association rules , 1995, CIKM '95.

[9]  Salvatore J. Stolfo,et al.  Predictive dynamic load balancing of parallel hash-joins over heterogeneous processors in the presence of data skew , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[10]  Masaru Kitsuregawa,et al.  Hash based parallel algorithms for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[11]  Masaru Kitsuregawa,et al.  Parallel mining algorithms for generalized association rules with classification hierarchy , 1997, SIGMOD '98.

[12]  Wei Li,et al.  New parallel algorithms for fast discovery of associ-ation rules , 1997 .

[13]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[14]  Masato Oguchi,et al.  Parallel Database Processing on a 100 Node PC Cluster: Cases for Decision Support Query Processing and Data Mining , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[15]  Jiawei Han,et al.  A fast distributed algorithm for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[16]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[17]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.