论文信息 - Standardization and Its Effects on K-Means Clustering Algorithm

Standardization and Its Effects on K-Means Clustering Algorithm

Data clustering is an important data exploration technique with many applications in data mining. K- means is one of the most well known methods of data mining that partitions a dataset into groups of patterns, many methods have been proposed to improve the performance of the K-means algorithm. Standardization is the central preprocessing step in data mining, to standardize values of features or attributes from different dynamic range into a specific range. In this paper, we have analyzed the performances of the three standardization methods on conventional K-means algorithm. By comparing the results on infectious diseases datasets, it was found that the result obtained by the z-score standardization method is more effective and efficient than min-max and decimal scaling standardization methods.

Ismail Mohamad | Dauda Usman | I. Mohamad | D. Usman

[1] Anil K. Jain,et al. Algorithms for Clustering Data , 1988 .

[2] Rupa G. Mehta,et al. Impact of Outlier Removal and Normalization Approach in Modified k-Means Clustering Algorithm , 2011 .

[3] Daniel T. Larose,et al. Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[4] Robert M. Haralick,et al. Feature normalization and likelihood-based similarity measures for image retrieval , 2001, Pattern Recognit. Lett..

[5] G. W. Milligan,et al. A study of standardization of variables in cluster analysis , 1988 .

[6] Zyad Shaaban,et al. Data Mining: A Preprocessing Engine , 2006 .

[7] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[8] Witold Pedrycz,et al. Data Mining: A Knowledge Discovery Approach , 2007 .