Parallelization of association rule mining: Survey

In todays big data era, all modern applications are generating and collecting large amount of data. As a result, data mining is encountering new challenges and opportunities to make algorithms such that, this voluminous data can be effectively and efficiently transformed into actionable knowledge . Traditional algorithms were designed to run sequentially over a single machine. But, as the volume of data increases computational cost associated with its processing also increases. This causes problems in analysing data on a single sequential machine and instead of assisting in data analysis, the processor serve more like a bottleneck. Parallel and distributed approaches improve the performance in terms of computational cost as well as scalability but experience some limitations during load balancing, data partitioning, job assignment, monitoring etc. MapReduce, a parallel programming model is a new concept which provides seemingly unlimited computing power, cheap storage as well as, can overcome above limitations. This makes it a topic of upcoming research interest. A detailed literature review of some existing methods is given along with their pros and cons.

[1]  Amit Jain,et al.  Multiclass classifier designing by Modified Crossover and Point Mutation technique using genetic programming , 2012, 2012 Ninth International Conference on Wireless and Optical Communications Networks (WOCN).

[2]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[3]  Srinivasan Parthasarathy,et al.  Cache-conscious frequent pattern mining on modern and emerging processors , 2007, The VLDB Journal.

[4]  Wang Yong,et al.  A parallel algorithm of association rules based on cloud computing , 2013, 2013 8th International Conference on Communications and Networking in China (CHINACOM).

[5]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6]  Jin Chang,et al.  Balanced parallel FP-Growth with MapReduce , 2010, 2010 IEEE Youth Conference on Information, Computing and Telecommunications.

[7]  Nick Cercone,et al.  Efficient mining of frequent itemsets in social network data based on MapReduce framework , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[8]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[9]  Leonardo Trujillo,et al.  Genetic programming with one-point crossover and subtree mutation for effective problem solving and bloat control , 2011, Soft Comput..

[10]  Nikhil R. Pal,et al.  A novel approach to design classifiers using genetic programming , 2004, IEEE Transactions on Evolutionary Computation.

[11]  Osmar R. Zaïane,et al.  Fast parallel association rule mining without candidacy generation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[12]  Min Zhang,et al.  The Strategy of Mining Association Rule Based on Cloud Computing , 2011, 2011 International Conference on Business Computing and Global Informatization.

[13]  Grant Dick,et al.  Implicitly Controlling Bloat in Genetic Programming , 2010, IEEE Transactions on Evolutionary Computation.

[14]  Osmar R. Zaïane,et al.  Parallel Bifold: Large-scale parallel pattern mining with constraints , 2006, Distributed and Parallel Databases.

[15]  Elena Baralis,et al.  SeaRum: A Cloud-Based Service for Association Rule Mining , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[16]  Nhien-An Le-Khac,et al.  Distributed Frequent Itemsets Mining in Heterogeneous Platforms , 2007 .

[17]  Weiming Shen,et al.  Incremental FP-Growth mining strategy for dynamic threshold value and database based on MapReduce , 2014, Proceedings of the 2014 IEEE 18th International Conference on Computer Supported Cooperative Work in Design (CSCWD).

[18]  Osman Hegazy,et al.  AN EFFICIENT IMPLEMENTATION OF APRIORI ALGORITHM BASED ON HADOOP-MAPREDUCE MODEL , 2012 .

[19]  Ramakrishnan Kannan,et al.  NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce , 2011, KDD.

[20]  Eli Upfal,et al.  PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce , 2012, CIKM.

[21]  Eric Li,et al.  Optimization of Frequent Itemset Mining on Multiple-Core Processor , 2007, VLDB.

[22]  Edward Y. Chang,et al.  Pfp: parallel fp-growth for query recommendation , 2008, RecSys '08.

[23]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[24]  Zhen Liu,et al.  MapReduce as a programming model for association rules algorithm on Hadoop , 2010, The 3rd International Conference on Information Sciences and Interaction Sciences.

[25]  Chris Triggs,et al.  Mathematics prevents bloat [genetic programming] , 2005, 2005 IEEE Congress on Evolutionary Computation.

[26]  Riccardo Poli,et al.  A Simple but Theoretically-Motivated Method to Control Bloat in Genetic Programming , 2003, EuroGP.

[27]  Terence Soule,et al.  Removal bias: a new cause of code growth in tree based evolutionary programming , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[28]  Rong Gu,et al.  YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[29]  Zhi Yang,et al.  Data Mining in Cloud Computing , 2013, ISCA 2013.

[30]  Lothar Thiele,et al.  Multiobjective genetic programming: reducing bloat using SPEA2 , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).