A Column-Wise Distance-Based Approach for Clustering of Gene Expression Data with Detection of Functionally Inactive Genes and Noise

Due to uncertainty and inherent noise present in gene expression data, clustering of the data is a challenging task. The common assumption of many clustering algorithms is that each gene belongs to a cluster. However, few genes are functionally inactive, i.e. not participate in any biological process during experimental conditions and should be segregated from clusters. Based on this observation, a clustering method is proposed in this article that clusters co-expressed genes and segregates functionally inactive genes and noise. The proposed method formed a cluster if the difference in expression levels of genes with a specified gene is less than a threshold t in each experimental condition; otherwise, the specified gene is marked as functionally inactive or noise. The proposed method is applied on 10 yeast gene expression data, and the result shows that it performs well over existing one.

[1]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[2]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Pradipta Maji,et al.  Rough-Fuzzy Clustering for Grouping Functionally Similar Genes from Microarray Data , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Heather J. Ruskin,et al.  Techniques for clustering gene expression data , 2008, Comput. Biol. Medicine.

[5]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[6]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.

[7]  Guy N. Brock,et al.  clValid , an R package for cluster validation , 2008 .

[8]  Ujjwal Maulik,et al.  An improved algorithm for clustering gene expression data , 2007, Bioinform..

[9]  Doulaye Dembélé,et al.  Fuzzy C-means Method for Clustering Microarray Data , 2003, Bioinform..

[10]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Francisco Azuaje,et al.  Cluster validation techniques for genome expression data , 2003, Signal Process..

[12]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[13]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .