Data.Mining is a process of extracting potentially useful information from raw Data, so as to improve the quality of the information service. With the rapid development of the Internet, the size of the data has increased from KB level to TB even PB level; The object of data mining is also more and more complicated, so the data mining algorithm need to be more efficient. Cloud computing can provide infrastructure to massive and complex data of data mining, as well as new challenging issues for data mining of cloud computing research are emerged. This paper introduces the basic concept of cloud computing and data mining firstly, and sketches out how data mining is used in cloud computing; Then summarizes the research of parallel programming mode especially analyses the Map-reduce programming model and it's development platform-Hadoop; finally, overviews efficient mass data mining algorithm based on parallel programming model and mass data mining service based on the cloud computing.
[1]
Aart J. C. Bik,et al.
Pregel: a system for large-scale graph processing
,
2010,
SIGMOD Conference.
[2]
Domenico Talia,et al.
KOALA: a co-allocating grid scheduler
,
2008
.
[3]
Hillol Kargupta,et al.
Distributed Data Mining Bibliography
,
2004
.
[4]
Christoforos E. Kozyrakis,et al.
Evaluating MapReduce for Multi-core and Multiprocessor Systems
,
2007,
2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[5]
Domenico Talia,et al.
How distributed data mining tasks can thrive as knowledge services
,
2010,
Commun. ACM.
[6]
Sanjay Ghemawat,et al.
MapReduce: Simplified Data Processing on Large Clusters
,
2004,
OSDI.
[7]
Kunle Olukotun,et al.
Map-Reduce for Machine Learning on Multicore
,
2006,
NIPS.