Complex statistical analysis of big data: Implementation and application of Apriori and FP-Growth algorithm based on MapReduce

In the single machine environment, the problems of Apriori and FP-Growth algorithm in large-scale data association rules mining are high memory consumption, low computing performance, poor scalability and reliability and so on. Therefore, we put forward a new implementation method which is based on MapReduce parallel environment for mining frequent itemsets to generate association rules and is verified by using different sizes of real datasets with different nodes in the cluster, meanwhile, selecting “speedup, scalability and reliability” as an indicator. The results show that our method is feasible and valid and is able to improve the overall performance and efficiency of Apriori and FP-Growth algorithm to meet the needs of large-scale data association rules mining.

[1]  Jia Yan An Efficient Method for the Parallel Mining of Frequent Itemsets in Very Large Text Databases , 2007 .

[2]  Yu Jian-kun Research on parallelizing Apriori algorithm based on HMT and Hash trees , 2012 .

[3]  Rafael Berlanga Llavori,et al.  Finding association rules in semantic web data , 2012, Knowl. Based Syst..

[4]  Ajith Abraham,et al.  An efficient algorithm for incremental mining of temporal association rules , 2010, Data Knowl. Eng..

[5]  Mustafa Mat Deris,et al.  A soft set approach for association rules mining , 2011, Knowl. Based Syst..

[6]  M. Dolores Ruiz,et al.  New Approaches for Discovering Exception and Anomalous Rules , 2011, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[7]  Witold Pedrycz,et al.  An improved association rules mining method , 2012, Expert Syst. Appl..

[8]  David Taniar,et al.  Exception rules in association rule mining , 2008, Appl. Math. Comput..

[9]  Suhani Nagpal Improved Apriori Algorithm using logarithmic decoding and pruning , 2012 .

[10]  Wen-Yang Lin,et al.  Incremental maintenance of generalized association rules under taxonomy evolution , 2008, J. Inf. Sci..

[11]  David Wai-Lok Cheung,et al.  An Adaptive Algorithm for Mining Association Rules on Shared-Memory Parallel Machines , 2001, Distributed and Parallel Databases.

[12]  I-En Liao,et al.  An improved frequent pattern growth method for mining association rules , 2011, Expert Syst. Appl..

[13]  Jesús Alcalá-Fdez,et al.  Mining fuzzy association rules from low-quality data , 2012, Soft Comput..

[14]  Anne Laurent,et al.  Extracting compact and information lossless sets of fuzzy association rules , 2011, Fuzzy Sets Syst..

[15]  He Bo Fast algorithm for mining global maximum frequent itemsets based on FP-tree , 2011 .

[16]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[17]  J. Agarwal,et al.  A New Method for Generating All Positive and Negative Association Rules , 2011 .

[18]  Seyed Mostafa Fakhrahmad,et al.  An Efficient Frequent Pattern Mining Method and its Parallelization in Transactional Databases , 2011, J. Inf. Sci. Eng..

[19]  Alicia Troncoso Lora,et al.  Mining quantitative association rules based on evolutionary computation and its application to atmospheric pollution , 2010, Integr. Comput. Aided Eng..

[20]  Qian Guang-chao One Optimized Method of Apriori Algorithm , 2008 .