Hybrid Dissimilarity Measurement for Intelligent Weight K-means Clustering

This paper represents one of the steps in overcoming the drawback of Weight K-means (WK-means), without considering the direction with respect to feature weights in feature space, using hybrid dissimilarity measure. This paper is aimed at breaking this point and facilitating feature space transformations directed by an intelligent Minkowski Weight K-means clustering algorithm through hybrid dissimilarity measure (iMWK-HD). The proposed hybrid dissimilarity measure assign weights to features based on Minkowski distance and Cosine dissimilarity, which can extend, shrink and rotate feature space so that the performance of WK-means clustering can be improved. In iMWK-HD, a new optimization objective function is designed based on minimizing the hybrid dissimilarity. Using the new objective function, it derives new updating rules for the iterations in the clustering procedure. Experimental results on UCI datasets demonstrate that iMWK-HD is superior to the three existing clustering algorithms, i.e. iK-means, iWK-means and iMWK-means. In addition, the proposed algorithms are immune to irrelevant features in cluster subspace.

[1]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[2]  Boris G. Mirkin,et al.  Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads , 2010, J. Classif..

[3]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[4]  Zhaohong Deng,et al.  Distance metric learning for soft subspace clustering in composite kernel space , 2016, Pattern Recognit..

[5]  Andy P. Field,et al.  Discovering Statistics Using SPSS , 2000 .

[6]  Dohan Kim,et al.  Group-theoretical vector space model , 2015, Int. J. Comput. Math..

[7]  Babak Hassibi,et al.  The p-norm generalization of the LMS algorithm for adaptive filtering , 2003, IEEE Transactions on Signal Processing.

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  Lijuan Wang,et al.  Enhanced soft subspace clustering through hybrid dissimilarity , 2015, J. Intell. Fuzzy Syst..

[10]  Sankar K. Pal,et al.  Fuzzy sets and decisionmaking approaches in vowel and speaker recognition , 1977 .

[11]  Renato Cordeiro de Amorim,et al.  Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering , 2012, Pattern Recognit..

[12]  Neil Davey,et al.  Non-Euclidean norms and data normalisation , 2004, ESANN.

[13]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[14]  Michel Verleysen,et al.  The Concentration of Fractional Distances , 2007, IEEE Transactions on Knowledge and Data Engineering.

[15]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Yunming Ye,et al.  Weighting Method for Feature Selection in K-Means , 2007 .