Clustering Algorithms in MapReduce: A Review

A MapReduce is a framework that allows processing the very big amounts of formless data in parallel across a distributed cluster of processors or individual computers. The MapReduce framework is mostly used to analyze the large amount of datasets in clustering environments. MapReduce has become a dominant parallel computing paradigm for big data. This paper describes well known strategies in MapReduce, and present comprehensive comparative algorithms in MapReduce in clustering environment.

[1]  Anjan K. Koundinya,et al.  MapReduce Design of K-Means Clustering Algorithm , 2013, 2013 International Conference on Information Science and Applications (ICISA).

[2]  Rong Ge,et al.  Improving MapReduce energy efficiency for computation intensive workloads , 2011, 2011 International Green Computing Conference and Workshops.

[3]  Di Ma,et al.  MR-DBSCAN: An Efficient Parallel Density-Based Clustering Algorithm Using MapReduce , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[4]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[5]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[6]  Qi Zhang,et al.  A MapReduce-Based Architecture for Rule Matching in Production System , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[7]  Guoping Wang,et al.  Multi-Query Optimization in MapReduce Framework , 2013, Proc. VLDB Endow..