Thalia: semantic search engine for biomedical abstracts

Abstract Summary Although the publication rate of the biomedical literature has been growing steadily during the last decades, the accessibility of pertinent research publications for biologist and medical practitioners remains a challenge. This article describes Thalia, which is a semantic search engine that can recognize eight different types of concepts occurring in biomedical abstracts. Thalia is available via a web-based interface or a RESTful API. A key aspect of our search engine is that it is updated from PubMed on a daily basis. We describe here the main building blocks of our tool as well as an evaluation of the retrieval capabilities of Thalia in the context of a precision medicine dataset. Availability and implementation Thalia is available at http://nactem.ac.uk/Thalia_BI/. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Sophia Ananiadou,et al.  Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry , 2011, PloS one.

[2]  Ellen M. Voorhees,et al.  Overview of the TREC 2020 Precision Medicine Track , 2017, TREC.

[3]  Ulf Leser,et al.  GeneView: a comprehensive semantic search engine for PubMed , 2012, Nucleic Acids Res..

[4]  Casey S. Greene,et al.  Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery , 2015, Briefings Bioinform..

[5]  Paul N. Schofield,et al.  Aber-OWL: a framework for ontology-based data access in biology , 2014, BMC Bioinformatics.

[6]  Sophia Ananiadou,et al.  Processing biological literature with customizable Web services supporting interoperable formats , 2014, Database J. Biol. Databases Curation.

[7]  AnaniadouSophia,et al.  Building a high-quality sense inventory for improved abbreviation disambiguation , 2010 .

[8]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[9]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[10]  Jaehoon Choi,et al.  BEST: Next-Generation Biomedical Entity Search Tool for Knowledge Discovery from Biomedical Literature , 2016, PloS one.

[11]  Bernd Müller,et al.  LIVIVO – the Vertical Search Engine for Life Sciences , 2017, Datenbank-Spektrum.

[12]  Sophia Ananiadou,et al.  Identifying Personalised Treatments and Clinical Trials for Precision Medicine using Semantic Search with Thalia , 2017, TREC.

[13]  Sophia Ananiadou,et al.  Mining metabolites: extracting the yeast metabolome from the literature , 2010, Metabolomics.

[14]  Sampo Pyysalo,et al.  Anatomical entity mention recognition at literature scale , 2013, Bioinform..

[15]  Zhiyong Lu,et al.  PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[16]  Zhiyong Lu,et al.  PubMed and beyond: a survey of web tools for searching biomedical literature , 2011, Database J. Biol. Databases Curation.

[17]  Sophia Ananiadou,et al.  Argo: an integrative, interactive, text mining-based workbench supporting curation , 2012, Database J. Biol. Databases Curation.

[18]  Sophia Ananiadou,et al.  Argo: enabling the development of bespoke workflows and services for disease annotation , 2016, Database J. Biol. Databases Curation.

[19]  Sophia Ananiadou,et al.  Disambiguating the species of biomedical named entities using natural language parsers , 2010, Bioinform..

[20]  Naoaki Okazaki,et al.  Building a high-quality sense inventory for improved abbreviation disambiguation , 2010, Bioinform..