Hp-Apriori: Horizontal parallel-apriori algorithm for frequent itemset mining from big data

Due to large scale and complexity of big data, mining the big data using a single personal computer is a difficult problem. With increasing in the size of databases, parallel computing systems can cause considerable advantages in the data mining applications by means of the exploitation of data mining algorithms. Parallelization of association rule mining algorithms is an important task in data mining to mine frequent patterns from transaction databases. These algorithms either distribute database horizontally or increase number of CPU to reduce execution time of frequent pattern mining. In this paper, a novel frequent itemset mining algorithm, namely Horizontal parallel-Apriori (HP-Apriori), is proposed that divides database both horizontally and vertically with partitioning mining process into four sub-processes so that all four tasks are performed in parallel way. Also the HP-Apriori tries to speed up the mining process by an index file that is generated in the first step of algorithm. The proposed algorithm has been compared with Count Distribution (CD) in terms of execution time and speedup criteria on the four real datasets. Experimental results demonstrated that the HP-Apriori outperforms over CD in terms of minimizing execution time and maximizing speedup in high scalability.

[1]  Lan Vu,et al.  Novel parallel method for association rule mining on multi-core shared memory systems , 2014, Parallel Comput..

[2]  Philip S. Yu,et al.  Efficient parallel data mining for association rules , 1995, CIKM '95.

[3]  Wei Li,et al.  New parallel algorithms for fast discovery of associ-ation rules , 1997 .

[4]  Yunhao Liu,et al.  Big Data: A Survey , 2014, Mob. Networks Appl..

[5]  Mohammed J. Zaki,et al.  Large-Scale Parallel Data Mining , 2002, Lecture Notes in Computer Science.

[6]  Norwati Mustapha,et al.  Efficient prime-based method for interactive mining of frequent patterns , 2010, Expert Syst. Appl..

[7]  Chih-Fong Tsai,et al.  Big data mining with parallel computing: A comparison of distributed and MapReduce methodologies , 2016, J. Syst. Softw..

[8]  Tzung-Pei Hong,et al.  Efficient algorithms for mining high-utility itemsets in uncertain databases , 2016, Knowl. Based Syst..

[9]  Wei Fan,et al.  Mining big data: current status, and forecast to the future , 2013, SKDD.

[10]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-Memory Multi-Processors , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[11]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[12]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[13]  Dhaval Nimavat,et al.  An Effective Hash-Based Algorithm for Mining , 2015 .

[14]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[15]  Nan Jiang,et al.  Mining Frequent Patterns by Differential Refinement of Clustered Bitmaps , 2006, SDM.

[16]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[17]  Yunliang Chen,et al.  Mining association rules in big data with NGEP , 2014, Cluster Computing.

[18]  Norwati Mustapha,et al.  Efficient Candidacy Reduction For Frequent Pattern Mining , 2010, ArXiv.

[19]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[20]  Mohammed J. Zaki Parallel and Distributed Data Mining: An Introduction , 1999, Large-Scale Parallel Data Mining.

[21]  Ronen Feldman,et al.  The Data Mining and Knowledge Discovery Handbook , 2005 .

[22]  Andreas Mueller,et al.  Fast sequential and parallel algorithms for association rule mining: a comparison , 1995 .

[23]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[24]  Srinivasan Parthasarathy,et al.  Parallel Algorithms for Discovery of Association Rules , 1997, Data Mining and Knowledge Discovery.

[25]  Philippe Lenca,et al.  OPTIMONOTONE MEASURES FOR OPTIMAL RULE DISCOVERY , 2012, Comput. Intell..