Effect of Different Distance Measures on the Performance of K-Means Algorithm: An Experimental Study in Matlab

K-means algorithm is a very popular clustering algorithm which is famous for its simplicity. Distance measure plays a very important rule on the performance of this algorithm. We have different distance measure techniques available. But choosing a proper technique for distance calculation is totally dependent on the type of the data that we are going to cluster. In this paper an experimental study is done in Matlab to cluster the iris and wine data sets with different distance measures and thereby observing the variation of the performances shown.

[1]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[2]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[5]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[6]  Graham J. Williams,et al.  Data Mining , 2000, Communications in Computer and Information Science.

[7]  Elena Deza,et al.  Encyclopedia of Distances , 2014 .

[8]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.