论文信息 - IOMRA - A High Efficiency Frequent Itemset Mining Algorithm Based on the MapReduce Computation Model

IOMRA - A High Efficiency Frequent Itemset Mining Algorithm Based on the MapReduce Computation Model

The goal of Frequent Item set Mining (FIM) is to find the biggest number of frequently used subsets from a big transaction database. In previous studies, using the advantage of multicore computing, the execution time of an Apriori algorithm was sharply decreased: when the size of a data set was more than TBs and a single host had been unable to afford a large number of operations by using a number of computers connected into a super computer to speed up execution as being the obvious solution. Some parallel Apriori algorithms, based on the MapReduce framework, have been proposed. However, with these algorithms, memory would be quickly exhausted and communication cost would rise sharply. This would greatly reduce execution efficiency. In this paper, we present an improved reformative Apriori algorithm that uses the length of each transaction to determine the size of the maximum merge candidate item sets. By reducing the production of low frequency item sets in Map function, memory exhaustion is ameliorated, greatly improving execution efficiency.

Kun-Ming Yu | Sheng-Hui Liu | Shi-Jia Liu | Shi-Xuan Chen

[1] Qing He,et al. Parallel Implementation of Apriori Algorithm Based on MapReduce , 2012, SNPD.

[2] Zhen Liu,et al. MapReduce as a programming model for association rules algorithm on Hadoop , 2010, The 3rd International Conference on Information Sciences and Interaction Sciences.

[3] Jiayi Zhou,et al. Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system , 2010, Expert Syst. Appl..

[4] Jiayi Zhou,et al. A Fuzzy Neural Network Based Scheduling Algorithm for Job Assignment on Computational Grids , 2007, NBiS.

[5] Yuan-Shao Huang,et al. An efficient frequent patterns mining algorithm based on MaPreduce framework , 2014, IOT 2014.

[6] Jiayi Zhou,et al. Tidset-Based Parallel FP-tree Algorithm for the Frequent Pattern Mining Problem on PC Clusters , 2008, GPC.

[7] Doug Johnson,et al. Computing in the Clouds. , 2010 .

[8] Ching-Hsien Hsu,et al. Accelerating Parallel Frequent Itemset Mining on Graphics Processors with Sorting , 2013, NPC.

[9] Min Zhang,et al. The Strategy of Mining Association Rule Based on Cloud Computing , 2011, 2011 International Conference on Business Computing and Global Informatization.

[10] Aaron Weiss,et al. Can the PC go green? , 2007, NTWK.

[11] Zhiwei Xu,et al. Can MPI Benefit Hadoop and MapReduce Applications? , 2011, 2011 40th International Conference on Parallel Processing Workshops.

[12] Ming-Yen Lin,et al. Apriori-based frequent itemset mining algorithms on MapReduce , 2012, ICUIMC.

[13] Osman Hegazy,et al. AN EFFICIENT IMPLEMENTATION OF APRIORI ALGORITHM BASED ON HADOOP-MAPREDUCE MODEL , 2012 .

[14] Tzung-Pei Hong,et al. A load-balanced distributed parallel mining algorithm , 2010, Expert Syst. Appl..