A New Dynamic Distributed Algorithm for Frequent Itemsets Mining

for association rules between items in large transactional databases is a central problem in the field of knowledge discovery. It has crucial applications in decision support and marketing strategy. Centralized and Distributed Association Rules Mining (DARM) include two phases of frequent itemset extraction and strong rule generation. The most important part of ARM is Frequent Itemsets Mining (FIM)and because of its importance in recent years, there have been many algorithms implemented for it. In this paper, we have focused on distributed Apriori-Like frequent itemsets mining and proposed a distributed algorithm, called New Dynamic Distributed Frequent Itemsets Mining (NDD-FIM), for geographically distributed data sets. NDD-FIM has a merger site to reduce communication overhead and eliminates size of dataset partitions dynamically. The experimental results show that our algorithm generates support counts of candidate itemsets quickerthan other DARM algorithms and reduces the size of average transactions, datasets, and messageexchanges.

[1]  W. Thomas,et al.  Parallel mining of association rules using a lattice based approach , 2007, Proceedings 2007 IEEE SoutheastCon.

[2]  Lin Feng,et al.  Research on Maximal Frequent Pattern Outlier Factor for Online High-Dimensional Time-Series Outlier Detection , 2010, J. Convergence Inf. Technol..

[3]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[4]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[5]  David Taniar,et al.  ODAM: An optimized distributed association rule mining algorithm , 2004, IEEE Distributed Systems Online.

[6]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[7]  J. Austen Chapter 1 Hierarchical Parallel Algorithms for Association Mining , 2009 .

[8]  Shraddha Savaliya,et al.  An Effective Hash-Based Algorithm for Mining Association Rules , 2015 .

[9]  Siu-Ming Yiu,et al.  Maintenance of maximal frequent itemsets in large databases , 2007, SAC '07.

[10]  Mohammed J. Zaki Parallel and Distributed Data Mining: An Introduction , 1999, Large-Scale Parallel Data Mining.

[11]  Soon Myoung Chung,et al.  A scalable algorithm for mining maximal frequent sequences using a sample , 2008, Knowledge and Information Systems.

[12]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[13]  Chia-Chu Chiang,et al.  A Parallel Apriori Algorithm for Frequent Itemsets Mining , 2006, Fourth International Conference on Software Engineering Research, Management and Applications (SERA'06).

[14]  Wu Jian,et al.  An Efficient Association Rule Mining Algorithm In Distributed Databases , 2008, First International Workshop on Knowledge Discovery and Data Mining (WKDD 2008).

[15]  Soon Myoung Chung,et al.  Efficient mining of maximal frequent itemsets from databases on a cluster of workstations , 2004, Knowledge and Information Systems.

[16]  Bernard Toursel,et al.  Distributed Data Mining , 2001, Scalable Comput. Pract. Exp..

[17]  Masaru Kitsuregawa,et al.  Parallel FP-Growth on PC Cluster , 2003, PAKDD.

[18]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[19]  David Wai-Lok Cheung,et al.  Effect of Data Skewness in Parallel Mining of Association Rules , 1998, PAKDD.

[20]  Wang Ailing An Improved Distributed Mining Algorithm of Association Rules , 2011 .

[21]  Ferenc Bodon,et al.  A fast APRIORI implementation , 2003, FIMI.

[22]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[23]  Jiawei Han,et al.  A fast distributed algorithm for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[24]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[25]  Ran Wolff,et al.  Communication-Efficient Distributed Mining of Association Rules , 2001, SIGMOD '01.

[26]  Swarup Roy,et al.  OPAM-An Efficient One Pass Association Mining Technique without Candidate Generation , 2008, J. Convergence Inf. Technol..

[27]  Mohammed J. Zaki,et al.  Large-Scale Parallel Data Mining , 2002, Lecture Notes in Computer Science.

[28]  David Wai-Lok Cheung,et al.  Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..

[29]  Philip S. Yu,et al.  A New Approach to Online Generation of Association Rules , 2001, IEEE Trans. Knowl. Data Eng..

[30]  Yun Li,et al.  Multi-Level Weighted Sequential Pattern Mining Based on Prime Encoding , 2010, J. Digit. Content Technol. its Appl..

[31]  S YuPhilip,et al.  A New Approach to Online Generation of Association Rules , 2001 .

[32]  Ran Wolff,et al.  A high-performance distributed algorithm for mining association rules , 2003, Third IEEE International Conference on Data Mining.

[33]  Carla E. Brodley,et al.  KDD-Cup 2000 organizers' report: peeling the onion , 2000, SKDD.

[34]  Philip S. Yu,et al.  Efficient parallel data mining for association rules , 1995, CIKM '95.