Analysis of K-Means and K-Medoids Algorithm For Big Data

Clustering plays a very vital role in exploring data, creating predictions and to overcome the anomalies in the data. Clusters that contain collateral, identical characteristics in a dataset are grouped using reiterative techniques. As the data in real world is growing day by day so very large datasets with little or no background knowledge can be identified into interesting patterns with clustering. So, in this paper the two most popular clustering algorithms K-Means and K-Medoids are evaluated on dataset transaction10k of KEEL. The input to these algorithms are randomly distributed data points and based on their similarity clusters has been generated. The comparison results show that time taken in cluster head selection and space complexity of overlapping of cluster is much better in K-Medoids than K-Means. Also K-Medoids is better in terms of execution time, non sensitive to outliers and reduces noise as compared to K-Means as it minimizes the sum of dissimilarities of data objects.