Bayesian ontology querying for accurate and noise-tolerant semantic searches

MOTIVATION Ontologies provide a structured representation of the concepts of a domain of knowledge as well as the relations between them. Attribute ontologies are used to describe the characteristics of the items of a domain, such as the functions of proteins or the signs and symptoms of disease, which opens the possibility of searching a database of items for the best match to a list of observed or desired attributes. However, naive search methods do not perform well on realistic data because of noise in the data, imprecision in typical queries and because individual items may not display all attributes of the category they belong to. RESULTS We present a method for combining ontological analysis with Bayesian networks to deal with noise, imprecision and attribute frequencies and demonstrate an application of our method as a differential diagnostic support system for human genetics. AVAILABILITY We provide an implementation for the algorithm and the benchmark at http://compbio.charite.de/boqa/. CONTACT Sebastian.Bauer@charite.de or Peter.Robinson@charite.de SUPPLEMENTARY INFORMATION Supplementary Material for this article is available at Bioinformatics online.

[1]  Kavishwar B. Wagholikar,et al.  Modeling Paradigms for Medical Diagnostic Decision Support: A Survey and Future Directions , 2012, Journal of Medical Systems.

[2]  Marcel H. Schulz,et al.  Exact score distribution computation for ontological similarity searches , 2011, BMC Bioinformatics.

[3]  Timothy W. Finin,et al.  Yahoo! as an ontology: using Yahoo! categories to describe documents , 1999, CIKM '99.

[4]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[5]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[6]  Alan F. Scott,et al.  McKusick's Online Mendelian Inheritance in Man (OMIM®) , 2008, Nucleic Acids Res..

[7]  Henry Lieberman,et al.  Ontologies Come of Age , 2005 .

[8]  Martin Vingron,et al.  Improved detection of overrepresentation of Gene-Ontology annotations with parent-child analysis , 2007, Bioinform..

[9]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[10]  Ségolène Aymé,et al.  [Orphanet, an information site on rare diseases]. , 2003, Soins.

[11]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[12]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[13]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[14]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..

[15]  Marcel H. Schulz,et al.  Clinical diagnostics in human genetics with semantic similarity searches in ontologies. , 2009, American journal of human genetics.

[16]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993 .

[17]  Peter N. Robinson,et al.  GOing Bayesian: model-based gene set analysis of genome-scale data , 2010, Nucleic acids research.

[18]  Daniel J. Rosenkrantz,et al.  An analysis of several heuristics for the traveling salesman problem , 2013, Fundamental Problems in Computing.

[19]  I. Simon,et al.  A probabilistic generative model for GO enrichment analysis , 2008, Nucleic acids research.

[20]  James A. Hendler,et al.  Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential , 2002 .

[21]  Peter N. Robinson,et al.  Introduction to Bio-Ontologies , 2011 .

[22]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[23]  Nicos Christofides Worst-Case Analysis of a New Heuristic for the Travelling Salesman Problem , 1976, Operations Research Forum.