Distributed Association Mining on Message Passing Systems

Association mining in finding relationships between items in a dataset has been demonstrated to be practical in business applications. Many companies are applying association mining on market data for analyzing consumers’ purchase behavior. The Apriori algorithm is the most established algorithm for association mining in finding frequent itemsets. However, the time complexity of the Apriori algorithm is dominated by the size of candidate itemsets. Research to date has focused on the efficient discovery of itemsets in a large dataset. Those improvements include the optimizations of data structures, the partitioning of datasets, and the parallelism of data mining. In this paper, we propose a distributed association mining algorithm in finding frequent itemsets. The work is different from many existing distributed algorithms where most of existing algorithms center on the reduction of the size of the dataset. Our distributed algorithm focuses on the reduction of the size of candidate itemsets. The work of candidate k-itemsets generation is evenly distributed to the nodes for workload balancing among processors. The complexity analysis of the distributed algorithm is also presented.

[1]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-Memory Multi-Processors , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[2]  John F. Roddick,et al.  Association mining , 2006, CSUR.

[3]  Patricia B. Cerrito Mining the Electronic Medical Record to Examine Physician Decisions , 2007, Advanced Computational Intelligence Paradigms in Healthcare.

[4]  Chia-Chu Chiang,et al.  A Parallel Apriori Algorithm for Frequent Itemsets Mining , 2006, Fourth International Conference on Software Engineering Research, Management and Applications (SERA'06).

[5]  Markus Hegland,et al.  The Apriori Algorithm – a Tutorial , 2005 .

[6]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[7]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[8]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[9]  M. Dunham,et al.  A SURVEY OF ASSOCIATION RULES , 2000 .

[10]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[11]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .