MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce

In today's digital world scenario, digital data is coming in and going out faster than ever before. This data is of no use until we extract some useful content from it. But, it is impractical and inefficient to use traditional database management techniques on big data. That's why, big data technologies like Hadoop comes to existence. Hadoop is an open source framework, which can be used to process the huge amount of data in parallel. To extract useful information, data mining techniques can be used. Among many techniques of data mining, clustering is most popular technique. Clustering bind together the similar data in same group, whereas, dissimilar data is scattered in different groups. K Means clustering algorithm is one of the clustering technique. Traditional K Means clustering tries to assign n data objects to k clusters starting with random initial centers. Experiments show that data mining results are inefficient and unstable, if we use random initial centers. In this paper, we have modified traditional K Means clustering algorithm by using improved initial centers. We have proposed various methods to calculate the initial centers and compared their results.

[1]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[2]  Anjan K. Koundinya,et al.  MapReduce Design of K-Means Clustering Algorithm , 2013, 2013 International Conference on Information Science and Applications (ICISA).

[3]  A. Hemanth THE HADOOP DISTRIBUTED FILE SYSTEM: BALANCING PORTABILTY , 2013 .

[4]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[5]  Avita Katal,et al.  Big data: Issues, challenges, tools and Good practices , 2013, 2013 Sixth International Conference on Contemporary Computing (IC3).

[6]  GhemawatSanjay,et al.  The Google file system , 2003 .

[7]  Qing He,et al.  Parallel K-Means Clustering Based on MapReduce , 2009, CloudCom.

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  Krishna Yadav Mini-Batch K-Means Clustering Using Map-Reduce in Hadoop , 2014 .