On Initializations for the Minkowski Weighted K-Means

Minkowski Weighted K-Means is a variant of K-Means set in the Minkowski space, automatically computing weights for features at each cluster. As a variant of K-Means, its accuracy heavily depends on the initial centroids fed to it. In this paper we discuss our experiments comparing six initializations, random and five other initializations in the Minkowski space, in terms of their accuracy, processing time, and the recovery of the Minkowski exponent p. We have found that the Ward method in the Minkowski space tends to outperform other initializations, with the exception of low-dimensional Gaussian Models with noise features. In these, a modified version of intelligent K-Means excels.

[1]  Renato Cordeiro de Amorim,et al.  Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering , 2012, Pattern Recognit..

[2]  M M Astrahan SPEECH ANALYSIS BY CLUSTERING, OR THE HYPERPHONEME METHOD , 1970 .

[3]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[4]  Boris G. Mirkin,et al.  Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads , 2010, J. Classif..

[5]  Boris Mirkin,et al.  Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science) , 2005 .

[6]  G. W. Milligan,et al.  A study of standardization of variables in cluster analysis , 1988 .

[7]  Michael K. Ng,et al.  An optimization algorithm for clustering using weighted dissimilarity measures , 2004, Pattern Recognit..

[8]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[9]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[10]  R. C. de Amorim,et al.  Constrained Intelligent K-Means: Improving Results with Limited Previous Knowledge. , 2008 .

[11]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Yunming Ye,et al.  Weighting Method for Feature Selection in K-Means , 2007 .

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  Pedro Larrañaga,et al.  An empirical comparison of four initialization methods for the K-Means algorithm , 1999, Pattern Recognit. Lett..

[15]  G H Ball,et al.  A clustering technique for summarizing multivariate data. , 1967, Behavioral science.

[16]  Douglas Steinley,et al.  Standardizing Variables in K -means Clustering , 2004 .

[17]  Michael J. Brusco,et al.  Initializing K-means Batch Clustering: A Critical Evaluation of Several Techniques , 2007, J. Classif..

[18]  Peter Komisarczuk,et al.  On partitional clustering of malware , 2012 .