Prediction of Co-Regulated Gene Groups through Gene Ontology

Gene ontology (GO) is organized in three principles, cellular component, biological process and molecular function. analysis of GO annotations of a list of differentially expressed genes on microarrays became a common approach in helping with their biological interpretation. Earlier studies in GO analysis are based on a single principle, mostly Biological Process; valuable information in the other two principles is neglected. This paper proposes a novel approach to investigate gene co-regulation based on GO annotations from all three principles. We used the semantic similarity of GO annotations as a measure to partition genes into functionally related clusters and developed a performance index (PI) that consolidates GO annotations from all three principles to measure the quality of each cluster. We successfully applied our algorithm to yeast dataset. Our results indicate that PI is a good measure of the likelihood of a cluster being co-regulated by one or more TFs. Another analysis based on individual GO principle indicates that gene annotations in biological process are the most informative and those in cellular component are the least informative with regard of gene co-regulation. However, none of the analyses based on an individual principle could provide satisfactory classification. It is important to consider gene annotations in all three principles

[1]  Youlian Pan Advances in the Discovery of cis-Regulatory Elements , 2006 .

[2]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[3]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[4]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[5]  Francis D. Gibbons,et al.  Judging the quality of gene expression-based clustering methods using gene annotation. , 2002, Genome research.

[6]  K. Fidelis,et al.  Discovering regulatory binding-site modules using rule-based learning. , 2005, Genome research.

[7]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[8]  Pedro M. Coutinho,et al.  Implementation of a Functional Semantic Similarity Measure between Gene-Products , 2003 .

[9]  Pooja Jain,et al.  The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae , 2005, Nucleic Acids Res..

[10]  Carole A. Goble,et al.  Semantic Similarity Measures as Tools for Exploring the Gene Ontology , 2002, Pacific Symposium on Biocomputing.

[11]  William Stafford Noble,et al.  Exploring Gene Expression Data with Class Scores , 2001, Pacific Symposium on Biocomputing.

[12]  Hans-Werner Mewes,et al.  MIPS: a database for protein sequences, homology data and yeast genome information , 1997, Nucleic Acids Res..

[13]  Ellen M. Voorhees,et al.  Implementing agglomerative hierarchic clustering algorithms for use in document retrieval , 1986, Inf. Process. Manag..

[14]  Michael Q. Zhang,et al.  Transcription factor binding element detection using functional clustering of mutant expression data. , 2004, Nucleic acids research.

[15]  Olivier Bodenreider,et al.  Gene expression correlation and gene ontology-based similarity: an assessment of quantitative relationships , 2004, 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[16]  Fazel Famili,et al.  Discovery of Functional Genes for Systemic Acquired Resistance in arabidopsis Thaliana through Integrated Data Mining , 2004, J. Bioinform. Comput. Biol..

[17]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[18]  Brandon Smith,et al.  A Novel Data Mining Technique for Gene Identification in Time-Series Gene Expression Data , 2004 .

[19]  Andreas Zell,et al.  A memetic clustering algorithm for the functional partition of genes based on the gene ontology , 2004, 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[20]  Julio J. Valdés,et al.  Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged. Data Mining of Gene Expression Changes in , 2003 .

[21]  S. Dwight,et al.  Predicting gene function from patterns of annotation. , 2003, Genome research.

[22]  M. Q. Zhang,et al.  Cluster, function and promoter: analysis of yeast expression array. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[23]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.