论文信息 - Clustering Algorithms in MapReduce: A Review

Clustering Algorithms in MapReduce: A Review

A MapReduce is a framework that allows processing the very big amounts of formless data in parallel across a distributed cluster of processors or individual computers. The MapReduce framework is mostly used to analyze the large amount of datasets in clustering environments. MapReduce has become a dominant parallel computing paradigm for big data. This paper describes well known strategies in MapReduce, and present comprehensive comparative algorithms in MapReduce in clustering environment.

Vinod S. Bawane | Sandesha M. Kale

[1] Anjan K. Koundinya,et al. MapReduce Design of K-Means Clustering Algorithm , 2013, 2013 International Conference on Information Science and Applications (ICISA).

[2] Rong Ge,et al. Improving MapReduce energy efficiency for computation intensive workloads , 2011, 2011 International Green Computing Conference and Workshops.

[3] Di Ma,et al. MR-DBSCAN: An Efficient Parallel Density-Based Clustering Algorithm Using MapReduce , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[4] Kunle Olukotun,et al. Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[5] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[6] Qi Zhang,et al. A MapReduce-Based Architecture for Rule Matching in Production System , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[7] Guoping Wang,et al. Multi-Query Optimization in MapReduce Framework , 2013, Proc. VLDB Endow..