Using parallel approach in pre-processing to improve frequent pattern growth algorithm

Mining frequent itemset is an important step in association rule mining process. In this paper we are applying a parallel approach in the pre-processing step itself to make the dataset favorable for mining frequent itemsets and hence improve the speed and computation power. Due to data explosion, it is necessary to develop a system that can handle scalable data. Many efficient sequential and parallel algorithms were proposed in the recent years. We first explore some major algorithms proposed for mining frequent itemsets. Sorting the dataset in the pre-processing step parallely and pruning the infrequent itemsets improves the efficiency of our algorithm. Due to the drastic improvement in computer architectures and computer performance over the years, high performance computing is gaining importance and we are using one such technique in our implementation: CUDA.

[1]  E. Rambabu,et al.  Association rule mining using FPTree as directed acyclic graph , 2012, IEEE-International Conference On Advances In Engineering, Science And Management (ICAESM -2012).

[2]  Christian Borgelt,et al.  An implementation of the FP-growth algorithm , 2005 .

[3]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[4]  Ding Zhenguo,et al.  An Improved FP-Growth Algorithm Based on Compound Single Linked List , 2009, 2009 Second International Conference on Information and Computing Science.

[5]  Bingsheng He,et al.  Parallel Data Mining on Graphics Processors , 2011 .

[6]  Raj P. Gopalan,et al.  CT-ITL : Efficient Frequent Item Set Mining Using a Compressed Prefix Tree with Pattern Growth , 2003, ADC.

[7]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[8]  Peiyi Tang,et al.  Parallelizing Frequent Itemset Mining with FP-Trees , 2006, Computers and Their Applications.

[9]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[10]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[11]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[12]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[13]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[14]  Min Chen,et al.  An efficient parallel FP-Growth algorithm , 2009, 2009 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.

[15]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[16]  Shirish Tatikonda,et al.  Toward terabyte pattern mining: an architecture-conscious solution , 2007, PPoPP.