A new validity measure for a correlation-based fuzzy c-means clustering algorithm

One of the major challenges in unsupervised clustering is the lack of consistent means for assessing the quality of clusters. In this paper, we evaluate several validity measures in fuzzy clustering and develop a new measure for a fuzzy c-means algorithm which uses a Pearson correlation in its distance metrics. The measure is designed with within-cluster sum of square, and makes use of fuzzy memberships. In comparing to the existing fuzzy partition coefficient and a fuzzy validity index, this new measure performs consistently across six microarray datasets. The newly developed measure could be used to assess the validity of fuzzy clusters produced by a correlation-based fuzzy c-means clustering algorithm.

[1]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Igor Jurisica,et al.  Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study , 2008, Nature Medicine.

[3]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[4]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.

[6]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[7]  Raffaele Giancarlo,et al.  Computational cluster validation for microarray data analysis: experimental assessment of Clest, Consensus Clustering, Figure of Merit, Gap Statistics and Model Explorer , 2008, BMC Bioinformatics.

[8]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[10]  Ping Yang,et al.  A fuzzy c-means algorithm using a correlation metrics and gene ontology , 2008, 2008 19th International Conference on Pattern Recognition.

[11]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[12]  Doulaye Dembélé,et al.  Fuzzy C-means Method for Clustering Microarray Data , 2003, Bioinform..