Correlation between gene expression and GO semantic similarity

This research analyzes some aspects of the relationship between gene expression, gene function, and gene annotation. Many recent studies are implicitly based on the assumption that gene products that are biologically and functionally related would maintain this similarity both in their expression profiles as well as in their gene ontology (GO) annotation. We analyze how accurate this assumption proves to be using real publicly available data. We also aim to validate a measure of semantic similarity for GO annotation. We use the Pearson correlation coefficient and its absolute value as a measure of similarity between expression profiles of gene products. We explore a number of semantic similarity measures (Resnik, Jiang, and Lin) and compute the similarity between gene products annotated using the GO. Finally, we compute correlation coefficients to compare gene expression similarity against GO semantic similarity. Our results suggest that the Resnik similarity measure outperforms the others and seems better suited for use in gene ontology. We also deduce that there seems to be correlation between semantic similarity in the GO annotation and gene expression for the three GO ontologies. We show that this correlation is negligible up to a certain semantic similarity value; then, for higher similarity values, the relationship trend becomes almost linear. These results can be used to augment the knowledge provided by clustering algorithms and in the development of bioinformatic tools for finding and characterizing gene products.

[1]  Olivier Bodenreider,et al.  Incorporating ontology-driven similarity knowledge into functional genomics: an exploratory study , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[2]  M. Bittner,et al.  Expression profiling in cancer using cDNA microarrays , 1999, Electrophoresis.

[3]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[5]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[6]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[7]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[8]  D. Barrell,et al.  The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. , 2003, Genome research.

[9]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[10]  Bing Zhang,et al.  GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies , 2004, BMC Bioinformatics.

[11]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[12]  P. Goodfellow,et al.  DNA microarrays in drug discovery and development , 1999, Nature Genetics.

[13]  Alan F. Smeaton,et al.  Using WordNet in a Knowledge-Based Approach to Information Retrieval , 1995 .

[14]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[15]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[16]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[17]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[18]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[19]  Angel Rubio,et al.  GARBAN: genomic analysis and rapid biological annotation of cDNA microarray and proteomic data , 2003, Bioinform..

[20]  Joaquín Dopazo,et al.  Bioinformatics methods for the analysis of expression arrays: data clustering and information extraction. , 2002, Journal of biotechnology.

[21]  Pedro M. Coutinho,et al.  Implementation of a Functional Semantic Similarity Measure between Gene-Products , 2003 .

[22]  Joaquín Dopazo,et al.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes , 2004, Bioinform..

[23]  Marsha Wills-Karp,et al.  Time to draw breath: asthma-susceptibility genes are identified , 2004, Nature Reviews Genetics.

[24]  Carole A. Goble,et al.  Semantic Similarity Measures as Tools for Exploring the Gene Ontology , 2002, Pacific Symposium on Biocomputing.

[25]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[26]  A. Orth,et al.  Large-scale analysis of the human and mouse transcriptomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.