论文信息 - Mining algorithm for association rules in big data based on Hadoop

Mining algorithm for association rules in big data based on Hadoop

In order to solve the problem that the traditional association rules mining algorithm has been unable to meet the mining needs of large amount of data in the aspect of efficiency and scalability, take FP-Growth as an example, the algorithm is realized in the parallelization based on Hadoop framework and Map Reduce model. On the basis, it is improved using the transaction reduce method for further enhancement of the algorithm’s mining efficiency. The experiment, which consists of verification of parallel mining results, comparison on efficiency between serials and parallel, variable relationship between mining time and node number and between mining time and data amount, is carried out in the mining results and efficiency by Hadoop clustering. Experiments show that the paralleled FP-Growth algorithm implemented is able to accurately mine frequent item sets, with a better performance and scalability. It can be better to meet the requirements of big data mining and efficiently mine frequent item sets and association rules from large dataset.In order to solve the problem that the traditional association rules mining algorithm has been unable to meet the mining needs of large amount of data in the aspect of efficiency and scalability, take FP-Growth as an example, the algorithm is realized in the parallelization based on Hadoop framework and Map Reduce model. On the basis, it is improved using the transaction reduce method for further enhancement of the algorithm’s mining efficiency. The experiment, which consists of verification of parallel mining results, comparison on efficiency between serials and parallel, variable relationship between mining time and node number and between mining time and data amount, is carried out in the mining results and efficiency by Hadoop clustering. Experiments show that the paralleled FP-Growth algorithm implemented is able to accurately mine frequent item sets, with a better performance and scalability. It can be better to meet the requirements of big data mining and efficiently mine frequent item sets and ass...

[1] Mohammed J. Zaki. Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[2] Goran Nenadic,et al. A Parallel Distributed Weka Framework for Big Data Mining Using Spark , 2015, 2015 IEEE International Congress on Big Data.

[3] Zhijian Wang,et al. The Parallel Improved Apriori Algorithm Research Based on Spark , 2015, 2015 Ninth International Conference on Frontier of Computer Science and Technology.

[4] Hannu Toivonen,et al. Sampling Large Databases for Association Rules , 1996, VLDB.

[5] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[6] Rakesh Agrawal,et al. Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[7] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.