A fast and distributed algorithm for mining frequent patterns in congested networks

With advances in technology, frequent pattern mining has been used widely in our daily lives. By using this technology, one can obtain interesting or useful information that would help one make decisions and apply judgment. For example, marketplace managers mine transaction data to obtain information that can help improve services, understand customer buying habits, determine a suitable scheme for placement of goods to increase profits, or for medical and biotechnology applications. However, the rate at which data is generated is very rapid, leading to problems caused by Big Data. Therefore, many researchers have studied distributed, parallel and cloud computing technology to select the best among them. However, data mining uses multiple computing nodes, which requires the transmission of a considerable amount of data in a network environment. The available network bandwidth is limited when many different tasks are being transmitted at the same time and many servers are working in the same network segment. This results in poor transmission, causing severe transfer delay, either internal or external to the network. Thus, we propose the fast and distributed mining algorithm for discovering frequent patterns in congested networks (FDMCN) algorithm, which is based on CARM. The main purpose is to reduce FP-tree transmission such that only a portion of the information is required for mining using computing nodes. The results of empirical evaluation under various simulation conditions show that the proposed method FDMCN delivers excellent performance in terms of execution efficiency and scalability when compared with the PSWS algorithm.

[1]  Elena Baralis,et al.  P-Mine: Parallel itemset mining on large datasets , 2013, 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW).

[2]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[3]  Yue-Shi Lee,et al.  The Studies of Mining Frequent Patterns Based on Frequent Pattern Tree , 2009, PAKDD.

[4]  Wolfgang Lehner,et al.  Memory-efficient frequent-itemset mining , 2011, EDBT/ICDT '11.

[5]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[6]  Gösta Grahne,et al.  Mining frequent itemsets from secondary memory , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[7]  Reda Alhajj,et al.  DRFP-tree: disk-resident frequent pattern tree , 2009, Applied Intelligence.

[8]  Zhongzhi Shi,et al.  DH-TRIE frequent pattern mining on Hadoop using JPA , 2011, 2011 IEEE International Conference on Granular Computing.

[9]  Jiayi Zhou,et al.  Tidset-Based Parallel FP-tree Algorithm for the Frequent Pattern Mining Problem on PC Clusters , 2008, GPC.

[10]  Yong Qiu,et al.  An improved algorithm of mining from FP-tree , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[11]  Kawuu W. Lin,et al.  A fast parallel algorithm for discovering frequent patterns , 2009, 2009 IEEE International Conference on Granular Computing.

[12]  Gösta Grahne,et al.  Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.

[13]  Lan Vu,et al.  Novel parallel method for mining frequent patterns on multi-core shared memory systems , 2013, DISCS-2013.

[14]  Jiayi Zhou,et al.  Balanced Tidset-based Parallel FP-tree Algorithm for the Frequent Pattern Mining on Grid System , 2008, 2008 Fourth International Conference on Semantics, Knowledge and Grid.

[15]  Ashfaq Khokhar,et al.  Frequent Pattern Mining on Message Passing Multiprocessor Systems , 2004, Distributed and Parallel Databases.

[16]  Kawuu W. Lin,et al.  Efficient algorithms for frequent pattern mining in many-task computing environments , 2013, Knowl. Based Syst..

[17]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[18]  Shi Zhongzhi,et al.  An Efficient Data Mining Framework on Hadoop using Java Persistence API , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[19]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[20]  Zhen Liu,et al.  MapReduce as a programming model for association rules algorithm on Hadoop , 2010, The 3rd International Conference on Information Sciences and Interaction Sciences.

[21]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[22]  Dan Zhang,et al.  TidFP: Mining Frequent Patterns in Different Databases with Transaction ID , 2009, DaWaK.

[23]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..