Rough-Fuzzy Clustering for Grouping Functionally Similar Genes from Microarray Data

Gene expression data clustering is one of the important tasks of functional genomics as it provides a powerful tool for studying functional relationships of genes in a biological process. Identifying coexpressed groups of genes represents the basic challenge in gene clustering problem. In this regard, a gene clustering algorithm, termed as robust rough-fuzzy c-means, is proposed judiciously integrating the merits of rough sets and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition, the integration of probabilistic and possibilistic memberships of fuzzy sets enables efficient handling of overlapping partitions in noisy environment. The concept of possibilistic lower bound and probabilistic boundary of a cluster, introduced in robust rough-fuzzy c-means, enables efficient selection of gene clusters. An efficient method is proposed to select initial prototypes of different gene clusters, which enables the proposed c-means algorithm to converge to an optimum or near optimum solutions and helps to discover coexpressed gene clusters. The effectiveness of the algorithm, along with a comparison with other algorithms, is demonstrated both qualitatively and quantitatively on 14 yeast microarray data sets.

[1]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[2]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Sankar K. Pal,et al.  RFCM: A Hybrid Clustering Algorithm Using Rough and Fuzzy Sets , 2007, Fundam. Informaticae.

[4]  Jian Pei,et al.  DHC: a density-based hierarchical clustering method for time series gene expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[5]  J. Kraut,et al.  Crystal structure of a complex between electron transfer partners, cytochrome c peroxidase and cytochrome c. , 1993, Science.

[6]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[7]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[8]  Hong Yan,et al.  Cluster Analysis of Gene Expression Data , 2009, Encyclopedia of Artificial Intelligence.

[9]  A. Brazma,et al.  Gene expression data analysis , 2000, FEBS letters.

[10]  W. Marsden I and J , 2012 .

[11]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[12]  A. Yakovlev,et al.  How high is the level of technical noise in microarray data? , 2007, Biology Direct.

[13]  P. Woolf,et al.  A fuzzy logic approach to analyzing gene expression data. , 2000, Physiological genomics.

[14]  Angel Rubio,et al.  Correlation between gene expression and GO semantic similarity , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[16]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[17]  Doulaye Dembélé,et al.  Fuzzy C-means Method for Clustering Microarray Data , 2003, Bioinform..

[18]  Pradipta Maji,et al.  Fuzzy–Rough Supervised Attribute Clustering Algorithm and Classification of Microarray Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Sankar K. Pal,et al.  Rough Set Based Generalized Fuzzy $C$ -Means Algorithm and Quantitative Indices , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[21]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[22]  K. Kwast,et al.  Dynamical Remodeling of the Transcriptome during Short-Term Anaerobiosis in Saccharomyces cerevisiae: Differential Response and Role of Msn2 and/or Msn4 and Other Factors in Galactose and Glucose Media , 2005, Molecular and Cellular Biology.

[23]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.

[24]  Sankar K. Pal,et al.  Rough-Fuzzy Pattern Recognition: Applications in Bioinformatics and Medical Imaging , 2012 .

[25]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[26]  D. Timson Galactose Metabolism in Saccharomyces cerevisiae , 2007 .

[27]  John Quackenbush,et al.  Microarray gene expression data analysis - a beginner's guide , 2003 .

[28]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[29]  Richard M. Karp,et al.  CLIFF: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts , 2001, ISMB.

[30]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[31]  T T FUKUYAMA,et al.  GALACTOSE METABOLISM I , 1962, Journal of bacteriology.

[32]  Graeme M. Walker,et al.  Yeast Physiology and Biotechnology , 1998 .

[33]  Debashis Ghosh,et al.  Mixture modelling of gene expression data from microarray experiments , 2002, Bioinform..

[34]  W. Peizhuang Pattern Recognition with Fuzzy Objective Function Algorithms (James C. Bezdek) , 1983 .

[35]  S. Pal,et al.  Segmentation of remotely sensed images with fuzzy thresholding, and quantitative evaluation , 2000 .

[36]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Geoffrey J. McLachlan,et al.  A mixture model-based approach to the clustering of microarray expression data , 2002, Bioinform..

[38]  Roberto Marcondes Cesar Junior,et al.  Inference from Clustering with Application to Gene-Expression Microarrays , 2002, J. Comput. Biol..

[39]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[40]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Alfonso Valencia,et al.  A hierarchical unsupervised growing neural network for clustering gene expression patterns , 2001, Bioinform..

[42]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[43]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[44]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.

[45]  P. D’haeseleer,et al.  Mining the gene expression matrix: inferring gene relationships from large scale gene expression data , 1998 .

[46]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[47]  Roded Sharan,et al.  CLICK: A Clustering Algorithm for Gene Expression Analysis , 2000, ISMB 2000.

[48]  Xia Li,et al.  A robust approach based on Weibull distribution for clustering gene expression data , 2011, Algorithms for Molecular Biology.

[49]  Nabil Belacel,et al.  Fuzzy J-Means and VNS methods for clustering genes from microarray data , 2004, Bioinform..

[50]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..