Implementation and Evaluation of Parallel Data Mining on PC Cluster and Optimization of its Execution Environments

Personal Computer/Workstation clusters have been studied intensively in the field of parallel and distributed computing. In the viewpoint of applications, data intensive applications such as data mining and ad-hoc query processing in databases are considered very important for high performance computing, as well as conventional scientific calculations. We have built and evaluated PC cluster pilot systems, especially SAN-connected PC cluster, and implemented parallel data mining on them. Several optimization, including dynamic data allocation, is discussed for the execution of this application. Keywords— PC cluster, Data Mining, Storage Area Network, Optimization, Dynamic data allocation.

[1]  Robert Armstrong,et al.  Commodity clusters: performance comparison between PCs and workstations , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[2]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[3]  Masato Oguchi,et al.  IMPLEMENTATION OF PARALLEL DATA MINING ON AN ATM-CONNECTED PC CLUSTER AND PERFORMANCE ANALYSIS OF TCP RETRANSMISSION MECHANISMS , 1999 .

[4]  Masato Oguchi,et al.  Parallel Database Processing on a 100 Node PC Cluster: Cases for Decision Support Query Processing and Data Mining , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[5]  Masaru Kitsuregawa,et al.  Hash based parallel algorithms for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[6]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[7]  Masato OGUCHI,et al.  Characteristics of a Parallel Data Mining Application Implemented on an ATM Connected PC Cluster , 1997, HPCN Europe.

[8]  Kai Li,et al.  Retrospective: virtual memory mapped network interface for the SHRIMP multicomputer , 1994, ISCA '98.

[9]  Sanjeev Setia,et al.  Availability and utility of idle memory in workstation clusters , 1999, SIGMETRICS '99.

[10]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[11]  Barry Phillips,et al.  Have Storage Area Networks Come of Age? , 1998, Computer.

[12]  Thomas L. Sterling,et al.  Communication overhead for space science applications on the Beowulf parallel workstation , 1995, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing.

[13]  Y. Ishikawa RWC PC Cluster II and SCore Cluster System Software-High Performance Linux Cluster , 1999 .

[14]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[15]  Dhabaleswar K. Panda,et al.  Efficient virtual interface architecture (VIA) support for the IBM SP switch-connected NT clusters , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[16]  Philip K. McKinley,et al.  Communication issues in parallel computing across ATM networks , 1994, IEEE Parallel & Distributed Technology: Systems & Applications.

[17]  Arvind,et al.  T: A Multithreaded Massively Parallel Architecture , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[18]  Johannes Gehrke,et al.  Mining Very Large Databases , 1999, Computer.

[19]  Joel M. Halpern,et al.  Classical IP and ARP over ATM , 1998, RFC.

[20]  Dan Grossman,et al.  Multiprotocol Encapsulation over ATM Adaptation Layer 5 , 1993, RFC.

[21]  Mitsuhisa Sato,et al.  PM: An Operating System Coordinated High Performance Communication Library , 1997, HPCN Europe.

[22]  Amnon Barak,et al.  Performance of the MOSIX Parallel System for a Cluster of PCs , 1997, HPCN Europe.

[23]  Masato Oguchi,et al.  Optimizing protocol parameters to large scale PC cluster and evaluation of its effectiveness with parallel data mining , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[24]  Andrea C. Arpaci-Dusseau,et al.  Parallel computing on the berkeley now , 1997 .