An Entropy Regularization k-Means Algorithm with a New Measure of between-Cluster Distance in Subspace Clustering

Although within-cluster information is commonly used in most clustering approaches, other important information such as between-cluster information is rarely considered in some cases. Hence, in this study, we propose a new novel measure of between-cluster distance in subspace, which is to maximize the distance between the center of a cluster and the points that do not belong to this cluster. Based on this idea, we firstly design an optimization objective function integrating the between-cluster distance and entropy regularization in this paper. Then, updating rules are given by theoretical analysis. In the following, the properties of our proposed algorithm are investigated, and the performance is evaluated experimentally using two synthetic and seven real-life datasets. Finally, the experimental studies demonstrate that the results of the proposed algorithm (ERKM) outperform most existing state-of-the-art k-means-type clustering algorithms in most cases.

[1]  Yunming Ye,et al.  DSKmeans: A new kmeans-type approach to discriminative subspace clustering , 2014, Knowl. Based Syst..

[2]  A Govardhan,et al.  Improved Text Clustering with Neighbors , 2015 .

[3]  Vladimir Makarenkov,et al.  Optimal Variable Weighting for Ultrametric and Additive Trees and K-means Partitioning: Methods and Software , 2001, J. Classif..

[4]  G. Soete Optimal variable weighting for ultrametric and additive tree clustering , 1986 .

[5]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[6]  Yunming Ye,et al.  Extensions of Kmeans-Type Algorithms: A New Clustering Framework by Integrating Intracluster Compactness and Intercluster Separation , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Swagatam Das,et al.  Automatic Clustering Using an Improved Differential Evolution Algorithm , 2007 .

[8]  Jiye Liang,et al.  A novel fuzzy clustering algorithm with between-cluster information for categorical data , 2013, Fuzzy Sets Syst..

[9]  Wei Pan,et al.  Penalized Model-Based Clustering with Application to Variable Selection , 2007, J. Mach. Learn. Res..

[10]  P. Green,et al.  A preliminary study of optimal variable weighting in k-means clustering , 1990 .

[11]  Yuan Zhang,et al.  Fuzzy clustering with the entropy of attribute weights , 2016, Neurocomputing.

[12]  Yahya Forghani Comment on "Enhanced soft subspace clustering integrating within-cluster and between-cluster information" by Z. Deng et al. (Pattern Recognition, vol. 43 pp. 767-781, 2010) , 2018, Pattern Recognit..

[13]  G. Soete OVWTRE: A program for optimal variable weighting for ultrametric and additive tree fitting , 1988 .

[14]  C. L. Philip Chen,et al.  Attribute weight entropy regularization in fuzzy C-means algorithm for feature selection , 2011, Proceedings 2011 International Conference on System Science and Engineering.

[15]  S. Lalitha IMPROVED TEXT CLUSTERING WITH NEIGHBORS , 2015 .

[16]  Gabriel Moreno-Hagelsieb,et al.  Phylogenomic clustering for selecting non-redundant genomes for comparative genomics , 2013, Bioinform..

[17]  Michael K. Ng,et al.  An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[18]  Zhaohong Deng,et al.  Enhanced soft subspace clustering integrating within-cluster and between-cluster information , 2010, Pattern Recognit..

[19]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Andreas Krause,et al.  Fast and Provably Good Seedings for k-Means , 2016, NIPS.

[21]  Huan Xu,et al.  Noisy Sparse Subspace Clustering , 2013, J. Mach. Learn. Res..

[22]  Huan Liu,et al.  Identifying Evolving Groups in Dynamic Multimode Networks , 2012, IEEE Transactions on Knowledge and Data Engineering.

[23]  Zongben Xu,et al.  Sparse K-Means with ℓ∞/ℓ0 Penalty for High-Dimensional Data Clustering , 2014, ArXiv.

[24]  Robert Tibshirani,et al.  A Framework for Feature Selection in Clustering , 2010, Journal of the American Statistical Association.

[25]  Michael Tschannen,et al.  Noisy Subspace Clustering via Matching Pursuits , 2018, IEEE Transactions on Information Theory.

[26]  Zhaohong Deng,et al.  A survey on soft subspace clustering , 2014, Inf. Sci..

[27]  Jiye Liang,et al.  The k-modes type clustering plus between-cluster information for categorical data , 2014, Neurocomputing.

[28]  Shokri Z. Selim,et al.  K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  James C. Bezdek,et al.  A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Jianhong Wu,et al.  Projective ART for clustering data sets in high dimensional spaces , 2002, Neural Networks.

[31]  Jian Yu,et al.  A novel fuzzy clustering algorithm based on a fuzzy scatter matrix with optimality tests , 2005, Pattern Recognit. Lett..

[32]  Geoffrey J. McLachlan,et al.  Modelling high-dimensional data by mixtures of factor analyzers , 2003, Comput. Stat. Data Anal..

[33]  Xiangyu Chang,et al.  Sparse Regularization in Fuzzy $c$ -Means for High-Dimensional Data Clustering , 2017, IEEE Transactions on Cybernetics.

[34]  Manju Sardana,et al.  A Comparative Study of Clustering Methods for Relevant Gene Selection in Microarray Data , 2012 .

[35]  Shrikanth S. Narayanan,et al.  Novel inter-cluster distance measure combining GLR and ICR for improved agglomerative hierarchical speaker clustering , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[36]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[37]  J. Carroll,et al.  Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables , 1984 .

[38]  Yinan Zhang,et al.  Sampling Clustering , 2018, ArXiv.

[39]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.