论文信息 - Parallelization of association rule mining: Survey

Parallelization of association rule mining: Survey

In todays big data era, all modern applications are generating and collecting large amount of data. As a result, data mining is encountering new challenges and opportunities to make algorithms such that, this voluminous data can be effectively and efficiently transformed into actionable knowledge . Traditional algorithms were designed to run sequentially over a single machine. But, as the volume of data increases computational cost associated with its processing also increases. This causes problems in analysing data on a single sequential machine and instead of assisting in data analysis, the processor serve more like a bottleneck. Parallel and distributed approaches improve the performance in terms of computational cost as well as scalability but experience some limitations during load balancing, data partitioning, job assignment, monitoring etc. MapReduce, a parallel programming model is a new concept which provides seemingly unlimited computing power, cheap storage as well as, can overcome above limitations. This makes it a topic of upcoming research interest. A detailed literature review of some existing methods is given along with their pros and cons.

Durga Toshniwal | Shivani Sharma | Durga Toshniwal | Shivani Sharma

[1] Amit Jain,et al. Multiclass classifier designing by Modified Crossover and Point Mutation technique using genetic programming , 2012, 2012 Ninth International Conference on Wireless and Optical Communications Networks (WOCN).

[2] John R. Koza,et al. Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[3] Srinivasan Parthasarathy,et al. Cache-conscious frequent pattern mining on modern and emerging processors , 2007, The VLDB Journal.

[4] Wang Yong,et al. A parallel algorithm of association rules based on cloud computing , 2013, 2013 8th International Conference on Communications and Networking in China (CHINACOM).

[5] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6] Jin Chang,et al. Balanced parallel FP-Growth with MapReduce , 2010, 2010 IEEE Youth Conference on Information, Computing and Telecommunications.

[7] Nick Cercone,et al. Efficient mining of frequent itemsets in social network data based on MapReduce framework , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[8] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[9] Leonardo Trujillo,et al. Genetic programming with one-point crossover and subtree mutation for effective problem solving and bloat control , 2011, Soft Comput..

[10] Nikhil R. Pal,et al. A novel approach to design classifiers using genetic programming , 2004, IEEE Transactions on Evolutionary Computation.

[11] Osmar R. Zaïane,et al. Fast parallel association rule mining without candidacy generation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[12] Min Zhang,et al. The Strategy of Mining Association Rule Based on Cloud Computing , 2011, 2011 International Conference on Business Computing and Global Informatization.

[13] Grant Dick,et al. Implicitly Controlling Bloat in Genetic Programming , 2010, IEEE Transactions on Evolutionary Computation.

[14] Osmar R. Zaïane,et al. Parallel Bifold: Large-scale parallel pattern mining with constraints , 2006, Distributed and Parallel Databases.

[15] Elena Baralis,et al. SeaRum: A Cloud-Based Service for Association Rule Mining , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[16] Nhien-An Le-Khac,et al. Distributed Frequent Itemsets Mining in Heterogeneous Platforms , 2007 .

[17] Weiming Shen,et al. Incremental FP-Growth mining strategy for dynamic threshold value and database based on MapReduce , 2014, Proceedings of the 2014 IEEE 18th International Conference on Computer Supported Cooperative Work in Design (CSCWD).

[18] Osman Hegazy,et al. AN EFFICIENT IMPLEMENTATION OF APRIORI ALGORITHM BASED ON HADOOP-MAPREDUCE MODEL , 2012 .

[19] Ramakrishnan Kannan,et al. NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce , 2011, KDD.

[20] Eli Upfal,et al. PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce , 2012, CIKM.

[21] Eric Li,et al. Optimization of Frequent Itemset Mining on Multiple-Core Processor , 2007, VLDB.

[22] Edward Y. Chang,et al. Pfp: parallel fp-growth for query recommendation , 2008, RecSys '08.

[23] Rakesh Agarwal,et al. Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[24] Zhen Liu,et al. MapReduce as a programming model for association rules algorithm on Hadoop , 2010, The 3rd International Conference on Information Sciences and Interaction Sciences.

[25] Chris Triggs,et al. Mathematics prevents bloat [genetic programming] , 2005, 2005 IEEE Congress on Evolutionary Computation.

[26] Riccardo Poli,et al. A Simple but Theoretically-Motivated Method to Control Bloat in Genetic Programming , 2003, EuroGP.

[27] Terence Soule,et al. Removal bias: a new cause of code growth in tree based evolutionary programming , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[28] Rong Gu,et al. YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[29] Zhi Yang,et al. Data Mining in Cloud Computing , 2013, ISCA 2013.

[30] Lothar Thiele,et al. Multiobjective genetic programming: reducing bloat using SPEA2 , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).