Gene Ontology term overlap as a measure of gene functional similarity

BackgroundThe availability of various high-throughput experimental and computational methods allows biologists to rapidly infer functional relationships between genes. It is often necessary to evaluate these predictions computationally, a task that requires a reference database for functional relatedness. One such reference is the Gene Ontology (GO). A number of groups have suggested that the semantic similarity of the GO annotations of genes can serve as a proxy for functional relatedness. Here we evaluate a simple measure of semantic similarity, term overlap (TO).ResultsWe computed the TO for randomly selected gene pairs from the mouse genome. For comparison, we implemented six previously reported semantic similarity measures that share the feature of using computation of probabilities of terms to infer information content, in addition to three vector based approaches and a normalized version of the TO measure. We find that the overlap measure is highly correlated with the others but differs in detail. TO is at least as good a predictor of sequence similarity as the other measures. We further show that term overlap may avoid some problems that affect the probability-based measures. Term overlap is also much faster to compute than the information content-based measures.ConclusionOur experiments suggest that term overlap can serve as a simple and fast alternative to other approaches which use explicit information content estimation or require complex pre-calculations, while also avoiding problems that some other measures may encounter.

[1]  Edwin M Stone,et al.  Comparative genomics and gene expression analysis identifies BBS9, a new Bardet-Biedl syndrome gene. , 2005, American journal of human genetics.

[2]  Brad T. Sherman,et al.  The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists , 2007, Genome Biology.

[3]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[4]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[7]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[8]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[9]  Angel Rubio,et al.  Correlation between gene expression and GO semantic similarity , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Carole A. Goble,et al.  Semantic Similarity Measures as Tools for Exploring the Gene Ontology , 2002, Pacific Symposium on Biocomputing.

[11]  Thomas L. Madden,et al.  BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. , 1999, FEMS microbiology letters.

[12]  Mario Albrecht,et al.  FunSimMat: a comprehensive functional similarity database , 2007, Nucleic Acids Res..

[13]  James M. Keller,et al.  Fuzzy Measures on the Gene Ontology for Gene Product Similarity , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[15]  Anita Burgun-Parenthoine,et al.  A transversal approach to predict gene product networks from ontology-based similarity , 2007, BMC Bioinformatics.

[16]  Kai Wang,et al.  Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks , 2007, ISMB/ECCB.

[17]  José María Carazo,et al.  A literature-based similarity metric for biological processes , 2006, BMC Bioinformatics.

[18]  Alessandra Livigni,et al.  Mitochondrial AKAP121 links cAMP and src signaling to oxidative metabolism. , 2005, Molecular biology of the cell.

[19]  A. Casadevall,et al.  Molecular Characterization of a cDNA That Encodes Six Isoforms of a Novel Murine A Kinase Anchor Protein* , 1998, The Journal of Biological Chemistry.

[20]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[21]  Alfonso Valencia,et al.  Defining functional distances over Gene Ontology , 2008, BMC Bioinformatics.

[22]  David Martin,et al.  GOToolBox: functional analysis of gene datasets based on Gene Ontology , 2004, Genome Biology.

[23]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[24]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.