A Scalable Approach for Improving Implementation of a Frequent Pattern Mining Algorithm using MapReduce Programming

A Frequent pattern is a pattern (a set of items, subsequences, sub-graphs, etc.) that occurs frequently in a transactional database. Frequent pattern mining gives vast benefit in domains such as extracting knowledge from transactional data for market basket analysis or cross-marketing and selling. A number of important FIM (Frequent itemset mining) algorithms have been developed to speed up mining performance since its inception. Unfortunately, when the dataset size is massive, it can still be prohibitively expensive for communication cost, memory usage, balanced data distribution & I/O utilization. One of the existing frequent pattern mining algorithms called CATS Tree (Compressed and Arranged Sequences tree) can perform interactive mining by a single scan. In this work, we propose to parallelize a part of CATS-Tree algorithm on scattered machines, which will improve the overall performance of CATS-Tree for large transaction data. This algorithm partitions computation to execute an independent group of mining tasks on each machine. We present a comparison based on time complexity, algorithm complexity and performance on a different type of datasets. The result shows that the proposed parallel implementation of CATS-Tree provides better performance for massive datasets.

[1]  Wang Yong,et al.  A parallel algorithm of association rules based on cloud computing , 2013, 2013 8th International Conference on Communications and Networking in China (CHINACOM).

[2]  M. Kavitha,et al.  Comparative Study on Apriori Algorithm and Fp Growth Algorithm with Pros and Cons , 2016 .

[3]  Antonio Gomariz,et al.  The SPMF Open-Source Data Mining Library Version 2 , 2016, ECML/PKDD.

[4]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[5]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[6]  Edward Y. Chang,et al.  Pfp: parallel fp-growth for query recommendation , 2008, RecSys '08.

[7]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[8]  G. Raju,et al.  Frequent itemset mining algorithms: A literature survey , 2015, 2015 IEEE International Advance Computing Conference (IACC).

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Osmar R. Zaïane,et al.  Incremental mining of frequent patterns without candidate generation or support constraint , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..