Visual Analytics for Comparing the Impact of Outliers in k-Means and k-Medoids Algorithm

Clustering is an unsupervised machine learning approach which plays a great role in assigning the data sets into specific clusters based upon some similarity or dissimilarity criterions. K-Means and K-Medoids are the well-known clustering algorithms that are widely used in different application areas of machine learning. K-Means algorithm is sensitive to the outliers due to influence on mean values by outliers in comparison to K-Medoids algorithm which uses medoids, the most centrally located values in a cluster. In this paper, the comparison of both algorithms have been done to evaluate the impact of outliers on their performances by using iris dataset and an interactive web application has been developed with visual analytics to display the impact of outliers on both these clustering algorithms for better insight. The application is accessible through the internet browser.

[1]  Clara Pizzuti,et al.  Distance-based detection and prediction of outliers , 2006, IEEE Transactions on Knowledge and Data Engineering.

[2]  Harshada C. Mandhare,et al.  A comparative study of cluster based outlier detection, distance based outlier detection and density based outlier detection techniques , 2017, 2017 International Conference on Intelligent Computing and Control Systems (ICICCS).

[3]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[4]  W. Chimphlee,et al.  Classification via k-means clustering and distance-based outlier detection , 2012, 2012 Tenth International Conference on ICT and Knowledge Engineering.