Improving the performance of a parallel data mining algorithm using communication message scheduling

In this paper, we propose several strategies for communication scheduling that can help increase the speed of parallel data mining algorithm. The proposed method based on the smart scheduling that enable parallel data mining application to fully utilize the interconnection infrastructure for communication. We compare the proposed strategies with a non communication scheduling algorithm on 2 to 32 cores multicore cluster systems using simulation. The experimental results show that a significantly performance improvement can be gained when a smart communication scheduling is included in the data mining algorithm.

[1]  Ashfaq Khokhar,et al.  Frequent Pattern Mining on Message Passing Multiprocessor Systems , 2004, Distributed and Parallel Databases.

[2]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[3]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[4]  Osmar R. Zaïane,et al.  Fast parallel association rule mining without candidacy generation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[5]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[6]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[7]  Arnon Rungsawang,et al.  Parallel association rule mining based on FI-growth algorithm , 2007, 2007 International Conference on Parallel and Distributed Systems.

[8]  David B. Skillicorn,et al.  Strategies for parallel data mining , 1999, IEEE Concurr..

[9]  Jiayi Zhou,et al.  Load Balancing Approach Parallel Algorithm for Frequent Pattern Mining , 2007, PaCT.