Clustering by Unified Principal Component Analysis and Fuzzy C-Means with Sparsity Constraint

For clustering high-dimensional data, most of the state-of-the-art algorithms often extract principal component beforehand, and then conduct a concrete clustering method. However, the two-stage strategy may deviate from assignments by directly optimizing the unified objective function. Different from the traditional methods, we propose a novel method referred to as clustering by unified principal component analysis and fuzzy c-means (UPF) for clustering high-dimensional data. Our model can explore underlying clustering structure in low-dimensional space and finish clustering simultaneously. In particular, we impose a L0-norm constraint on the membership matrix to make the matrix more sparse. To solve the model, we propose an effective iterative optimization algorithm. Extensive experiments on several benchmark data sets in comparison with two-stage algorithms are conducted to validate effectiveness of the proposed method. The experiments results demonstrate that the performance of our proposed method is superiority.

[1]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[2]  Michael K. Ng,et al.  A fuzzy k-modes algorithm for clustering categorical data , 1999, IEEE Trans. Fuzzy Syst..

[3]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Mita Nasipuri,et al.  Conditional Spatial Fuzzy C-means Clustering Algorithm with Application in MRI Image Segmentation , 2015 .

[6]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[7]  Niva Das,et al.  Modified possibilistic fuzzy C-means algorithms for segmentation of magnetic resonance image , 2016, Appl. Soft Comput..

[8]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[9]  Xuelong Li,et al.  Unsupervised Feature Selection with Structured Graph Optimization , 2016, AAAI.

[10]  Chunhong Pan,et al.  Robust level set image segmentation via a local correntropy-based K-means clustering , 2014, Pattern Recognit..

[11]  Feiping Nie,et al.  Clustering and projected clustering with adaptive neighbors , 2014, KDD.

[12]  Mohammad Hossein Fazel Zarandi,et al.  Generalized Possibilistic Fuzzy C-Means with novel cluster validity indices for clustering noisy data , 2017, Appl. Soft Comput..

[13]  Xiaojun Chang,et al.  Semisupervised Feature Analysis by Mining Correlations Among Multiple Tasks , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Xian Fu,et al.  Kernel-based fuzzy c-means clustering algorithm based on genetic algorithm , 2016, Neurocomputing.

[15]  Christian Sohler,et al.  A Fast k-Means Implementation Using Coresets , 2008, Int. J. Comput. Geom. Appl..

[16]  Maoguo Gong,et al.  Fuzzy C-Means Clustering With Local Information and Kernel Metric for Image Segmentation , 2013, IEEE Transactions on Image Processing.

[17]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[18]  Jiawei Han,et al.  Orthogonal Laplacianfaces for Face Recognition , 2006, IEEE Transactions on Image Processing.

[19]  James C. Bezdek,et al.  Convergence of Alternating Optimization , 2003, Neural Parallel Sci. Comput..

[20]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[21]  Michael K. Ng,et al.  Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters , 2008, IEEE Transactions on Knowledge and Data Engineering.

[22]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[23]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .