Cloud computing provides cheap and efficient solutions of storing and analyzing mass data. It is very important to research the data mining strategy based on cloud computing from the theoretical view and practical view. In this paper, the strategy of mining association rules in cloud computing environment is focused on. Firstly, cloud computing, Hadoop, MapReduce programming model, Apriori algorithm and parallel association rule mining algorithm are introduced. Then, a parallel association rule mining strategy adapting to the cloud computing environment is designed. It includes data set division method, data set allocation method, improved Apriori algorithm, and the implementation procedure of the improved Apriori algorithm on MapReduce. Finally, the Hadoop platform is built and the experiment for testing performance of the strategy as well as the improved algorithm has been done. The results show that the strategy designed in this paper can archive higher efficiency when doing frequent item set mining in cloud computing environment.
[1]
Xu Cong-fu.
State-of-art on association rules mining technology
,
2009
.
[2]
Alekh Jindal,et al.
Hadoop++
,
2010
.
[3]
Lakshmi Sobhana Kalli,et al.
Market-Oriented Cloud Computing : Vision , Hype , and Reality for Delivering IT Services as Computing
,
2013
.
[4]
Doug Johnson,et al.
Computing in the Clouds.
,
2010
.
[5]
Rakesh Agrawal,et al.
Parallel Mining of Association Rules
,
1996,
IEEE Trans. Knowl. Data Eng..
[6]
Aaron Weiss,et al.
Can the PC go green?
,
2007,
NTWK.
[7]
Sanjay Ghemawat,et al.
MapReduce: Simplified Data Processing on Large Clusters
,
2004,
OSDI.