An Efficient Distributed Algorithm for Mining Association Rules

Association Rule Mining (ARM) is an active data mining research area. However, most ARM algorithms cater to a centralized environment where no external communication is required. Distributed Association Rule Mining (DARM) algorithms aim to generate rules from different datasets spread over various geographical sites; hence, they require external communications throughout the entire processor. A direct application of sequential algorithms to distributed databases is not effective, because it requires a large amount of communication overhead. DARM algorithms must reduce communication costs. In this paper, a new solution is proposed to reduce the size of message exchanges. Our solution also reduces the size of average transactions and datasets that leads to reduction of scan time, which is very effective in increasing the performance of the proposed algorithm. Our performance study shows that this solution has a better performance over the direct application of a typical sequential algorithm.

[1]  Yi Pan,et al.  Introduction: Recent Developments in Parallel and Distributed Data Mining , 2004, Distributed and Parallel Databases.

[2]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[3]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[4]  Masaru Kitsuregawa,et al.  Hash based parallel algorithms for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[5]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[6]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[7]  Johannes Gehrke,et al.  Mining Very Large Databases , 1999, Computer.

[8]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[9]  J ZakiMohammed Parallel and Distributed Association Mining , 1999 .

[10]  Fabrizio Silvestri,et al.  A Scalable Multi-Strategy Algorithm for Counting Frequent Sets , 2002 .

[11]  David Wai-Lok Cheung,et al.  Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..

[12]  Jiawei Han,et al.  A fast distributed algorithm for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[13]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[14]  Vipin Kumar,et al.  Scalable Parallel Data Mining for Association Rules , 2000, IEEE Trans. Knowl. Data Eng..

[15]  Osmar R. Zaïane,et al.  Fast parallel association rule mining without candidacy generation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[16]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[17]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[18]  Ran Wolff,et al.  A high-performance distributed algorithm for mining association rules , 2004, Knowledge and Information Systems.

[19]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[20]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[21]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[22]  Heikki Mannila,et al.  Fast Discovery of Association Rules in Large Databases , 1996, Knowledge Discovery and Data Mining.

[23]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-Memory Multi-Processors , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[24]  Ran Wolff,et al.  Communication-efficient distributed mining of association rules , 2001, SIGMOD '01.

[25]  David Taniar,et al.  ODAM: An optimized distributed association rule mining algorithm , 2004, IEEE Distributed Systems Online.