Statistical Tests for Associations between Two Directed Acyclic Graphs

Biological data, and particularly annotation data, are increasingly being represented in directed acyclic graphs (DAGs). However, while relevant biological information is implicit in the links between multiple domains, annotations from these different domains are usually represented in distinct, unconnected DAGs, making links between the domains represented difficult to determine. We develop a novel family of general statistical tests for the discovery of strong associations between two directed acyclic graphs. Our method takes the topology of the input graphs and the specificity and relevance of associations between nodes into consideration. We apply our method to the extraction of associations between biomedical ontologies in an extensive use-case. Through a manual and an automatic evaluation, we show that our tests discover biologically relevant relations. The suite of statistical tests we develop for this purpose is implemented and freely available for download.

[1]  Ying Xu,et al.  Prediction of functional modules based on comparative genome analysis and Gene Ontology application , 2005, Nucleic acids research.

[2]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[3]  Xiaomei Wu,et al.  Prediction of yeast protein–protein interaction network: insights from the Gene Ontology and annotations , 2006, Nucleic acids research.

[4]  Safaai Deris,et al.  A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences , 2008, J. Biomed. Informatics.

[5]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[6]  Dietrich Rebholz-Schuhmann,et al.  Combining Evidence, Specificity, and Proximity towards the Normalization of Gene Ontology Terms in Text , 2008, EURASIP J. Bioinform. Syst. Biol..

[7]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[8]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[9]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[10]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[11]  A. Rector,et al.  Relations in biomedical ontologies , 2005, Genome Biology.

[12]  Lawrence Hunter,et al.  Enrichment of OBO ontologies , 2007, J. Biomed. Informatics.

[13]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[14]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[15]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[16]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[17]  Eero Hyvönen,et al.  CEUR Workshop Proceedings , 2008 .

[18]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[19]  Alfonso Valencia,et al.  Defining functional distances over Gene Ontology , 2008, BMC Bioinformatics.

[20]  Ian Horrocks,et al.  The OBO to OWL Mapping, GO to OWL 1.1! , 2007, OWLED.

[21]  M. Ashburner,et al.  An ontology for cell types , 2005, Genome Biology.

[22]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[23]  Robert Hoehndorf,et al.  General Formal Ontology (GFO) - A Foundational Ontology Integrating Objects and Processes [Version 1.0] , 2006 .

[24]  Jude W. Shavlik,et al.  Machine Learning: Proceedings of the Fifteenth International Conference , 1998 .

[25]  Mário J. Silva,et al.  Semantic similarity over the gene ontology: family correlation and selecting disjunctive ancestors , 2005, CIKM '05.