Where to search top-K biomedical ontologies?

Abstract Motivation Searching for precise terms and terminological definitions in the biomedical data space is problematic, as researchers find overlapping, closely related and even equivalent concepts in a single or multiple ontologies. Search engines that retrieve ontological resources often suggest an extensive list of search results for a given input term, which leads to the tedious task of selecting the best-fit ontological resource (class or property) for the input term and reduces user confidence in the retrieval engines. A systematic evaluation of these search engines is necessary to understand their strengths and weaknesses in different search requirements. Result We have implemented seven comparable Information Retrieval ranking algorithms to search through ontologies and compared them against four search engines for ontologies. Free-text queries have been performed, the outcomes have been judged by experts and the ranking algorithms and search engines have been evaluated against the expert-based ground truth (GT). In addition, we propose a probabilistic GT that is developed automatically to provide deeper insights and confidence to the expert-based GT as well as evaluating a broader range of search queries. Conclusion The main outcome of this work is the identification of key search factors for biomedical ontologies together with search requirements and a set of recommendations that will help biomedical experts and ontology engineers to select the best-suited retrieval mechanism in their search scenarios. We expect that this evaluation will allow researchers and practitioners to apply the current search techniques more reliably and that it will help them to select the right solution for their daily work. Availability The source code (of seven ranking algorithms), ground truths and experimental results are available at https://github.com/danielapoliveira/bioont-search-benchmark

[1]  Steve Pettifer,et al.  EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats , 2013, Bioinform..

[2]  Helen E. Parkinson,et al.  The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) , 2016, Nucleic Acids Res..

[3]  Erik Segerdell,et al.  An ontology for Xenopus anatomy and development , 2008, BMC Developmental Biology.

[4]  Jürgen Umbrich,et al.  YARS2: A Federated Repository for Querying Graph Structured Data from the Web , 2007, ISWC/ASWC.

[5]  Wei Hu,et al.  Link Analysis of Life Science Linked Data , 2015, SEMWEB.

[6]  Bin Zhao,et al.  OGG: a Biological Ontology for Representing Genes and Genomes in Specific Organisms , 2014, ICBO.

[7]  Bart Lamiroy,et al.  Computing Precision and Recall with Missing or Uncertain Ground Truth , 2011, GREC.

[8]  Damian Smedley,et al.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data , 2014, Nucleic Acids Res..

[9]  Cui Tao,et al.  OAE: The Ontology of Adverse Events , 2014, J. Biomed. Semant..

[10]  Sean Bechhofer,et al.  The OWL API: A Java API for OWL ontologies , 2011, Semantic Web.

[11]  Dean Allemang,et al.  The Semantic Web - ISWC 2006, 5th International Semantic Web Conference, ISWC 2006, Athens, GA, USA, November 5-9, 2006, Proceedings , 2006, SEMWEB.

[12]  Terry F. Hayamizu,et al.  Mouse anatomy ontologies: enhancements and tools for exploring and integrating biomedical data , 2015, Mammalian Genome.

[13]  A. Rector,et al.  Relations in biomedical ontologies , 2005, Genome Biology.

[14]  Joo Han Kim Chi-Square Goodness-of-Fit Tests for Randomly Censored Data , 1993 .

[15]  Paul N. Schofield,et al.  The mouse pathology ontology, MPATH; structure and applications , 2013, Journal of Biomedical Semantics.

[16]  Paul N. Schofield,et al.  The role of ontologies in biological and biomedical research: a functional perspective , 2015, Briefings Bioinform..

[17]  Nuno A. Fonseca,et al.  Expression Atlas update—a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments , 2013, Nucleic Acids Res..

[18]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[19]  Paul Buitelaar,et al.  OntoSelect: A Dynamic Ontology Library with Support for Ontology Selection , 2004 .

[20]  Alan Ruttenberg,et al.  MIREOT: The minimum information to reference an external ontology term , 2009, Appl. Ontology.

[21]  Enrico Motta,et al.  Watson, more than a Semantic Web search engine , 2011, Semantic Web.

[22]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[23]  Asunción Gómez-Pérez,et al.  ONTOMETRIC: A Method to Choose the Appropriate Ontology , 2004, J. Database Manag..

[24]  Yuanzhi Li,et al.  A Theoretical Analysis of NDCG Ranking Measures , 2013 .

[25]  Morris A. Swertz,et al.  ontoCAT: an R package for ontology traversal and search , 2011, Bioinform..

[26]  Mathias Brochhausen,et al.  Building a drug ontology based on RxNorm and other sources , 2013, VDOS+DO@ICBO.

[27]  Gang Feng,et al.  Disease Ontology: a backbone for disease semantic integration , 2011, Nucleic Acids Res..

[28]  Eyal Oren,et al.  Sindice.com: Weaving the Open Linked Data , 2007, ISWC/ASWC.

[29]  M. Ashburner,et al.  An ontology for cell types , 2005, Genome Biology.

[30]  Jeff Z. Pan,et al.  ONTOSEARCH2: Searching Ontologies Semantically , 2007, OWLED.

[31]  Cynthia L. Smith,et al.  Integrating phenotype ontologies across multiple species , 2010, Genome Biology.

[32]  Harith Alani,et al.  Ranking Ontologies with AKTiveRank , 2006, SEMWEB.

[33]  Janan T. Eppig,et al.  The Vertebrate Trait Ontology: a controlled vocabulary for the annotation of trait data across species , 2013, Journal of Biomedical Semantics.

[34]  Marco Masseroli,et al.  GenoSurf: metadata driven semantic search system for integrated genomic datasets , 2019, Database J. Biol. Databases Curation.

[35]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[36]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[37]  Monte Westerfield,et al.  The zebrafish anatomy and stage ontologies: representing the anatomy and development of Danio rerio , 2014, Journal of Biomedical Semantics.

[38]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[39]  Egon L. Willighagen,et al.  Emerging practices for mapping and linking life sciences data using RDF - A case series , 2012, J. Web Semant..

[40]  José L. V. Mejino,et al.  A reference ontology for biomedical informatics: the Foundational Model of Anatomy , 2003, J. Biomed. Informatics.

[41]  Jean-Marc Ogier,et al.  Graphics Recognition. New Trends and Challenges , 2013, Lecture Notes in Computer Science.

[42]  Simon Jupp,et al.  A new Ontology Lookup Service at EMBL-EBI , 2015, SWAT4LS.

[43]  Csongor Nyulas,et al.  BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications , 2011, Nucleic Acids Res..

[44]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[45]  Yugyung Lee,et al.  OntoKhoj: a semantic web portal for ontology searching, ranking and classification , 2003, WIDM '03.

[46]  Yuzhong Qu,et al.  Falcons Concept Search: A Practical Search Engine for Web Ontologies , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[47]  Paul Buitelaar,et al.  Evaluating Ontology Search , 2007, EON.

[48]  O Bodenreider,et al.  Biomedical ontologies in action: role in knowledge management, data integration and decision support. , 2008, Yearbook of medical informatics.

[49]  Anna Zhukova,et al.  Modeling sample variables with an Experimental Factor Ontology , 2010, Bioinform..

[50]  Suzanna E. Lewis,et al.  Uberon: towards a comprehensive multi-species anatomy ontology , 2009 .

[51]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[52]  Lincoln Stein,et al.  The Plant Ontology Database: a community resource for plant structure and developmental stages controlled vocabulary and annotations , 2008, Nucleic Acids Res..

[53]  Nicola Guarino,et al.  Evaluating ontological decisions with OntoClean , 2002, CACM.

[54]  Giuseppe Attardi,et al.  Ranking very many typed entities on wikipedia , 2007, CIKM '07.

[55]  Robert Hoehndorf,et al.  The neurobehavior ontology: an ontology for annotation and integration of behavior and behavioral phenotypes. , 2012, International review of neurobiology.

[56]  Paul W. Sternberg,et al.  Worm Phenotype Ontology: Integrating phenotype data within and beyond the C. elegans community , 2011, BMC Bioinformatics.

[57]  Ganesh Ramakrishnan,et al.  Explicit Query Interpretation and Diversification for Context-Driven Concept Search Across Ontologies , 2016, International Semantic Web Conference.

[58]  Morris A. Swertz,et al.  OntoCAT -- simple ontology search and integration in Java, R and REST/JavaScript , 2011, BMC Bioinformatics.

[59]  Geoffrey J. Gordon,et al.  Artificial Intelligence in Medicine: 17th Conference on Artificial Intelligence in Medicine, AIME 2019, Poznan, Poland, June 26–29, 2019, Proceedings , 2019, Lecture Notes in Computer Science.

[60]  Aldo Gangemi,et al.  A theoretical framework for ontology evaluation and validation , 2005, SWAP.

[61]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[62]  Sherri de Coronado,et al.  NCI Thesaurus: A semantic model integrating cancer-related clinical and molecular information , 2007, J. Biomed. Informatics.

[63]  Armin Haller,et al.  Ontology Search: An Empirical Evaluation , 2014, SEMWEB.

[64]  Cynthia L. Smith,et al.  The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information , 2004, Genome Biology.