A new cluster validity measure based on general type-2 fuzzy sets: Application in gene expression data clustering

As a widespread pattern recognition technique, clustering has been widely used in various disciplines including: science, engineering, medicine, etc. One the latest progresses in this field is introduction of general type-2 fuzzy sets and the new clustering method represented on its basis called general type-2 fuzzy c-means. In this paper, the aim is to develop a robust and accurate similarity measure between general type-2 fuzzy sets. Utilizing philosophy behind this developed similarity measure, the first exclusively developed general type-2 fuzzy cluster validity index will be proposed to be used for finding the optimal number of clusters through using general type-2 fuzzy c-means. To verify quality of the proposed approaches, several heavy computations have been conducted on artificial datasets and also real gene expression datasets. Numerical comparisons reveal robustness and quality of the proposed approach compared to several similar approaches in the literature.

[1]  Wenyi Zeng,et al.  Normalized distance, similarity measure, inclusion measure and entropy of interval-valued fuzzy sets and their relationship , 2008, Inf. Sci..

[2]  Doulaye Dembélé,et al.  Fuzzy C-means Method for Clustering Microarray Data , 2003, Bioinform..

[3]  Milos Manic,et al.  General Type-2 Fuzzy C-Means Algorithm for Uncertain Fuzzy Clustering , 2012, IEEE Transactions on Fuzzy Systems.

[4]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[5]  Jerry M. Mendel,et al.  Computing the centroid of a general type-2 fuzzy set by means of the centroid-flow algorithm , 2011, IEEE Transactions on Fuzzy Systems.

[6]  I. Burhan Türksen,et al.  MiniMax ε-stable cluster validity index for type-2 fuzziness , 2010, 2010 Annual Meeting of the North American Fuzzy Information Processing Society.

[7]  Mohammad Hossein Fazel Zarandi,et al.  A New Cluster Validity Index for Fuzzy Clustering Based on Similarity Measure. , 2007 .

[8]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[9]  Zhaohui S. Qin,et al.  Clustering microarray gene expression data using weighted Chinese restaurant process , 2006, Bioinform..

[10]  Sanghamitra Bandyopadhyay,et al.  Analysis of Biological Data: A Soft Computing Approach , 2007, Science, Engineering, and Biology Informatics.

[11]  Y. Fukuyama,et al.  A new method of choosing the number of clusters for the fuzzy c-mean method , 1989 .

[12]  Hooman Tahayori,et al.  Approximated Type-2 Fuzzy Set Operations , 2006, 2006 IEEE International Conference on Fuzzy Systems.

[13]  Milos Manic,et al.  Monotone Centroid Flow Algorithm for Type Reduction of General Type-2 Fuzzy Sets , 2012, IEEE Transactions on Fuzzy Systems.

[14]  Sanghamitra Bandyopadhyay,et al.  Gene expression data clustering using a multiobjective symmetry based clustering technique , 2013, Comput. Biol. Medicine.

[15]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Soon-H. Kwon Cluster validity index for fuzzy clustering , 1998 .

[17]  Wenyi Zeng,et al.  Relationship between similarity measure and entropy of interval valued fuzzy sets , 2006, Fuzzy Sets Syst..

[18]  Uwe Aickelin,et al.  Wavelet Feature Extraction and Genetic Algorithm for Biomarker Detection in Colorectal Cancer Data , 2013, Knowl. Based Syst..

[19]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Mohammad Hossein Fazel Zarandi,et al.  A new indirect approach to the type-2 fuzzy systems modeling and design , 2013, Inf. Sci..

[21]  Changlin Mei,et al.  Entropy of interval-valued fuzzy sets based on distance and its relationship with similarity measure , 2009, Knowl. Based Syst..

[22]  Miin-Shen Yang,et al.  On similarity and inclusion measures between type-2 fuzzy sets with an application to clustering , 2009, Comput. Math. Appl..

[23]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[24]  H. B. Mitchell Pattern recognition using type-II fuzzy sets , 2005, Inf. Sci..

[25]  J. Bezdek Cluster Validity with Fuzzy Sets , 1973 .

[26]  Young-Il Kim,et al.  A cluster validation index for GK cluster analysis based on relative degree of sharing , 2004, Inf. Sci..

[27]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Ujjwal Maulik,et al.  Simulated annealing based automatic fuzzy clustering combined with ANN classification for analyzing microarray data , 2010, Comput. Oper. Res..

[29]  Jerry M. Mendel,et al.  $\alpha$-Plane Representation for Type-2 Fuzzy Sets: Theory and Applications , 2009, IEEE Transactions on Fuzzy Systems.

[30]  I. Burhan Türksen,et al.  Validation criteria for enhanced fuzzy clustering , 2008, Pattern Recognit. Lett..

[31]  Jerry M. Mendel,et al.  A comparative study of ranking methods, similarity measures and uncertainty measures for interval type-2 fuzzy sets , 2009, Inf. Sci..

[32]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Miin-Shen Yang,et al.  Similarity, inclusion and entropy measures between type-2 fuzzy sets based on the Sugeno integral , 2011, Math. Comput. Model..

[34]  Mohammad Hossein Fazel Zarandi,et al.  A new validation criteria for type-2 fuzzy c-means and possibilistic c-means , 2012, 2012 Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS).

[35]  Hong Yan,et al.  Hybrid method for the analysis of time series gene expression data , 2012, Knowl. Based Syst..

[36]  Jian Xiao,et al.  A new approach to similarity and inclusion measures between general type-2 fuzzy sets , 2014, Soft Comput..

[37]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[38]  C. Müller,et al.  Large-scale clustering of cDNA-fingerprinting data. , 1999, Genome research.

[39]  Qihao Chen,et al.  On fuzzy-valued fuzzy reasoning , 2000, Fuzzy Sets Syst..

[40]  E. Winzeler,et al.  Genomics, gene expression and DNA arrays , 2000, Nature.

[41]  Frank Chung-Hoon Rhee,et al.  Uncertain Fuzzy Clustering: Interval Type-2 Fuzzy Approach to $C$-Means , 2007, IEEE Transactions on Fuzzy Systems.

[42]  Jerry M. Mendel,et al.  Uncertainty measures for interval type-2 fuzzy sets , 2007, Inf. Sci..

[43]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[44]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[45]  Rainer Fuchs,et al.  Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters , 2001, Bioinform..

[46]  Ujjwal Maulik,et al.  An improved algorithm for clustering gene expression data , 2007, Bioinform..

[47]  Minho Kim,et al.  New indices for cluster validity assessment , 2005, Pattern Recognit. Lett..

[48]  J. Mendel Uncertain Rule-Based Fuzzy Logic Systems: Introduction and New Directions , 2001 .

[49]  Feilong Liu,et al.  An efficient centroid type-reduction strategy for general type-2 fuzzy logic system , 2008, Inf. Sci..

[50]  Jerry M. Mendel,et al.  Type-2 fuzzy sets made simple , 2002, IEEE Trans. Fuzzy Syst..