Finding Clusters of Positive and Negative Coregulated Genes in Gene Expression Data

In this paper, we propose a system for finding partial positive and negative coregulated gene clusters in microarray data. Genes are clustered together if they show the same pattern of changing tendencies in a user definied number of condition pairs. It is assumed that genes which show similar expression patterns under a number of conditions are under the control of the same transcription factor and are related to a similar function in the cell. Taking positive and negative coregulation of genes into account, we find two types of information:(1) clusters of genes showing the same changing tendency and (2) relationships between two such clusters whose respective members show opposite changing tendency. Because genes may be coregulated by different transcription factors under different environmental conditions, our algorithm allows the same gene to fall into different clusters. Overlapping gene clusters are allowed because coregulation normally takes place in only a fraction of the investigated condition pairs, and because the gene expression data is noisy so that the approach should be tolerant to errors. In a first step, the gene expression matrix is transformed to a binned matrix of changing tendencies between all condition pairs. For the binning of the gene expression levels, a statistical technique is used, for which no arbitrary threshold needs to be chosen, which automatically corrects for multiple testing, and which is able to handle replicates for the different conditions, immediately accounting for the random variability of gene expression data. To present the results of a clustering a new structure called coregulation graph is proposed.

[1]  Robert Tibshirani,et al.  SAM “Significance Analysis of Microarrays” Users guide and technical document , 2002 .

[2]  M. Gerstein,et al.  Genomic analysis of gene expression relationships in transcriptional regulatory networks. , 2003, Trends in genetics : TIG.

[3]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[4]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[5]  Ge Yu,et al.  Mining Positive and Negative Co-regulation Patterns from Microarray Data , 2006, Sixth IEEE Symposium on BioInformatics and BioEngineering (BIBE'06).

[6]  Philip S. Yu,et al.  A fast algorithm for subspace clustering by pattern similarity , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[7]  Julio Collado-Vides,et al.  RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions , 2005, Nucleic Acids Res..

[8]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[9]  Kian-Lee Tan,et al.  Mining gene expression data for positive and negative co-regulated gene clusters , 2004, Bioinform..

[10]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.