Gradual representation of shadowed set for clustering gene expression data

Abstract Micro array has been a widely used microscopic measurement that accumulates the expression levels of a large number of genes varying over different time points. Cluster analysis more over the concept of bi-clustering provides insight into meaningful information from the correlation of a subset of genes with a subset of conditions. This eventually helps in discovering biologically meaningful clusters over analyzing missing values, imprecision and noise present in micro array data set. Although the concept of fuzzy set is enough to deal with the overlapping nature of the bi-clusters but the use of shadowed set helps in identifying and analyzing the nature of the genes lying in the confusion area of the clusters. In this article, we have suggested a bi-clustering model of the shadowed set with gradual representation of cardinality and named it as Gradual shadowed set for gene expression (GSS-GE) clustering. It identifies the bi-clusters in the core and in the shadowed region and evaluates their biological significance. The excellence of the proposed GSS-GE has been demonstrated by considering three real data sets, namely yeast data, serum data and mouse data set. The performance is compared with Ching Church’s algorithm (CC), Bimax, order preserving sub matrix (OPSM), Large Average Sub matrices (LAS), statistical plaid model and a modified fuzzy co-clustering (MFCC) algorithm. For the mouse data set there is no cluster level analysis of the micro array has been done so far. We have also provided the statistical and biological significance to prove the superiority of the proposed GSS-GE.

[1]  Chun-Hung Su,et al.  A modified fuzzy co-clustering (MFCC) approach for microarray data analysis , 2014, 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[2]  Madasu Hanmandlu,et al.  A non-extensive entropy feature and its application to texture classification , 2013, Neurocomputing.

[3]  Yang Yan,et al.  Fuzzy semi-supervised co-clustering for text documents , 2013, Fuzzy Sets Syst..

[4]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[5]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[6]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[7]  Roberto Marcondes Cesar Junior,et al.  Inference from Clustering with Application to Gene-Expression Microarrays , 2002, J. Comput. Biol..

[8]  G. McLachlan Mathematical classification and clustering. , 1998 .

[9]  Witold Pedrycz,et al.  Shadowed sets: representing and processing fuzzy sets , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[10]  Hidetomo Ichihashi,et al.  Fuzzy clustering for categorical multivariate data , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[11]  Toshima Z. Parris,et al.  Dose-specific transcriptional responses in thyroid tissue in mice after (131)I administration. , 2015, Nuclear medicine and biology.

[12]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[13]  William-Chandra Tjhi,et al.  Possibilistic fuzzy co-clustering of large document collections , 2007, Pattern Recognit..

[14]  Ujjwal Maulik,et al.  An improved algorithm for clustering gene expression data , 2007, Bioinform..

[15]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[16]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[17]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[18]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[19]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[20]  Ujjwal Maulik,et al.  Multiobjective fuzzy biclustering in microarray data: Method and a new performance measure , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[21]  Witold Pedrycz,et al.  Shadowed c-means: Integrating fuzzy and rough clustering , 2010, Pattern Recognit..

[22]  Ulrich Bodenhofer,et al.  FABIA: factor analysis for bicluster acquisition , 2010, Bioinform..

[23]  Edward J. Coyle,et al.  An energy efficient hierarchical clustering algorithm for wireless sensor networks , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[24]  Ronald R. Yager,et al.  On the fuzzy cardinality of a fuzzy set , 2006, Int. J. Gen. Syst..

[25]  Ying Xu,et al.  QUBIC: a qualitative biclustering algorithm for analyses of gene expression data , 2009, Nucleic acids research.

[26]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[27]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[28]  Kalyani Mali,et al.  Fuzzy-based artificial bee colony optimization for gray image segmentation , 2016, Signal Image Video Process..

[29]  A. Nobel,et al.  Finding large average submatrices in high dimensional data , 2009, 0905.1682.

[30]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[31]  Doulaye Dembélé,et al.  Fuzzy C-means Method for Clustering Microarray Data , 2003, Bioinform..

[32]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem. , 2003 .

[33]  Kalyani Mali,et al.  A two threshold model for shadowed set with gradual representation of cardinality , 2017, 2017 14th IEEE India Council International Conference (INDICON).

[34]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[35]  William-Chandra Tjhi,et al.  A partitioning based algorithm to fuzzy co-cluster documents and words , 2006, Pattern Recognit. Lett..