A Distributed Algorithm for Fast Mining Frequent Patterns in Limited and Varying Network Bandwidth Environments

Data mining is a set of methods used to mine hidden information from data. It mainly includes frequent pattern mining, sequential pattern mining, classification, and clustering. Frequent pattern mining is used to discover the correlation among various sets of items within large databases. The rapid upward trend in data size slows the mining of frequent patterns. Numerous studies have attempted to develop algorithms that operate in distributed computing environments to accelerate the mining process. FLR-mining (Fast, Load balancing and Resource efficient mining algorithm) is one of the fastest methods of mining with efficient consideration of load balancing and resources. FLR-mining can automatically determine the appropriate number of computing nodes. However, FLR-mining and existing methods assume that the network bandwidth is constant. In practical distributed and many-task computing systems, this assumption fails because there are packet collisions caused by many mining tasks that run in a simultaneous manner. Therefore, a method that can consider the varying network bandwidth is necessary. In this study, we propose a method that can rapidly mine frequent patterns under the varying network bandwidth. The proposed method can also determine the appropriate number of computing nodes to efficiently utilize computing resources and achieve load balancing. Through empirical evaluation, the proposed method is shown to deliver excellent performance in terms of execution efficiency and load balancing.

[1]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[2]  Osmar R. Zaïane,et al.  Fast parallel association rule mining without candidacy generation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3]  Zhongzhi Shi,et al.  DH-TRIE frequent pattern mining on Hadoop using JPA , 2011, 2011 IEEE International Conference on Granular Computing.

[4]  Zhen Liu,et al.  MapReduce as a programming model for association rules algorithm on Hadoop , 2010, The 3rd International Conference on Information Sciences and Interaction Sciences.

[5]  Jimmy Ming-Tai Wu,et al.  A Sanitization Approach to Secure Shared Data in an IoT Environment , 2019, IEEE Access.

[6]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[7]  Chih-Ping Chu,et al.  Determining the appropriate number of nodes for fast mining of frequent patterns in distributed computing environments , 2015, Int. J. Parallel Emergent Distributed Syst..

[8]  Kawuu W. Lin,et al.  A fast parallel algorithm for discovering frequent patterns , 2009, 2009 IEEE International Conference on Granular Computing.

[9]  Jiayi Zhou,et al.  Tidset-Based Parallel FP-tree Algorithm for the Frequent Pattern Mining Problem on PC Clusters , 2008, GPC.

[10]  Lu Yang,et al.  Mining of skyline patterns by considering both frequent and utility constraints , 2019, Eng. Appl. Artif. Intell..

[11]  Yun Sing Koh,et al.  A Survey of Sequential Pattern Mining , 2017 .

[12]  Qing He,et al.  Distributed data mining in grid computing environments , 2007, Future Gener. Comput. Syst..

[13]  Kai-Uwe Sattler,et al.  SQL based frequent pattern mining without candidate generation , 2004, SAC '04.

[14]  Tzung-Pei Hong,et al.  A load-balanced distributed parallel mining algorithm , 2010, Expert Syst. Appl..

[15]  úhg,et al.  An Algorithm for Mining Association Rules Using Perfect Hashing and Database Pruning , 2001 .

[16]  I. Halcu,et al.  A big data implementation based on Grid computing , 2013, 2013 11th RoEduNet International Conference.

[17]  Wolfgang Lehner,et al.  Memory-efficient frequent-itemset mining , 2011, EDBT/ICDT '11.

[18]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[19]  K. Vanhoof,et al.  Profiling of High-Frequency Accident Locations by Use of Association Rules , 2003 .

[20]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[21]  Jyoti Agarwal,et al.  Frequent item set generation based on transaction hashing , 2014, 2014 5th International Conference - Confluence The Next Generation Information Technology Summit (Confluence).

[22]  Jiayi Zhou,et al.  Balanced Tidset-based Parallel FP-tree Algorithm for the Frequent Pattern Mining on Grid System , 2008, 2008 Fourth International Conference on Semantics, Knowledge and Grid.

[23]  Ashfaq Khokhar,et al.  Frequent Pattern Mining on Message Passing Multiprocessor Systems , 2004, Distributed and Parallel Databases.

[24]  Kawuu W. Lin,et al.  Efficient algorithms for frequent pattern mining in many-task computing environments , 2013, Knowl. Based Syst..

[25]  Geert Wets,et al.  Using association rules for product assortment decisions: a case study , 1999, KDD '99.

[26]  Kawuu W. Lin,et al.  A fast and resource efficient mining algorithm for discovering frequent patterns in distributed computing environments , 2015, Future Gener. Comput. Syst..

[27]  Philip S. Yu,et al.  HUOPM: High-Utility Occupancy Pattern Mining , 2018, IEEE Transactions on Cybernetics.