Semantic biomedical resource discovery: a Natural Language Processing framework

BackgroundA plethora of publicly available biomedical resources do currently exist and are constantly increasing at a fast rate. In parallel, specialized repositories are been developed, indexing numerous clinical and biomedical tools. The main drawback of such repositories is the difficulty in locating appropriate resources for a clinical or biomedical decision task, especially for non-Information Technology expert users. In parallel, although NLP research in the clinical domain has been active since the 1960s, progress in the development of NLP applications has been slow and lags behind progress in the general NLP domain.The aim of the present study is to investigate the use of semantics for biomedical resources annotation with domain specific ontologies and exploit Natural Language Processing methods in empowering the non-Information Technology expert users to efficiently search for biomedical resources using natural language.MethodsA Natural Language Processing engine which can “translate” free text into targeted queries, automatically transforming a clinical research question into a request description that contains only terms of ontologies, has been implemented. The implementation is based on information extraction techniques for text in natural language, guided by integrated ontologies. Furthermore, knowledge from robust text mining methods has been incorporated to map descriptions into suitable domain ontologies in order to ensure that the biomedical resources descriptions are domain oriented and enhance the accuracy of services discovery. The framework is freely available as a web application at (http://calchas.ics.forth.gr/).ResultsFor our experiments, a range of clinical questions were established based on descriptions of clinical trials from the ClinicalTrials.gov registry as well as recommendations from clinicians. Domain experts manually identified the available tools in a tools repository which are suitable for addressing the clinical questions at hand, either individually or as a set of tools forming a computational pipeline. The results were compared with those obtained from an automated discovery of candidate biomedical tools. For the evaluation of the results, precision and recall measurements were used. Our results indicate that the proposed framework has a high precision and low recall, implying that the system returns essentially more relevant results than irrelevant.ConclusionsThere are adequate biomedical ontologies already available, sufficiency of existing NLP tools and quality of biomedical annotation systems for the implementation of a biomedical resources discovery framework, based on the semantic annotation of resources and the use on NLP techniques. The results of the present study demonstrate the clinical utility of the application of the proposed framework which aims to bridge the gap between clinical question in natural language and efficient dynamic biomedical resources discovery.

[1]  Manolis Tsiknakis,et al.  A Semantic Infrastructure for the Integration of Bioinformatics Services , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[2]  Jeanne-Marie Guise,et al.  Reporting Discrepancies Between the ClinicalTrials.gov Results Database and Peer-Reviewed Publications , 2014, Annals of Internal Medicine.

[3]  Steve Pettifer,et al.  BioXSD: the common data-exchange format for everyday bioinformatics web services , 2010, Bioinform..

[4]  Barry Smith,et al.  Proceedings of the AMIA Symposium , 2005 .

[5]  Dina Demner-Fushman,et al.  Biomedical Text Mining: A Survey of Recent Progress , 2012, Mining Text Data.

[6]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[7]  Manolis Tsiknakis,et al.  Scientific discovery workflows in bioinformatics: A scenario for the coupling of molecular regulatory pathways and gene-expression profiles , 2010, MedInfo.

[8]  Robert Rohn,et al.  Institute of Computer Science , 2012 .

[9]  Cheng Zhang,et al.  Biomedical text mining and its applications in cancer research , 2013, J. Biomed. Informatics.

[10]  Alexander van Deursen,et al.  Using the Internet: Skill related problems in users' online behavior , 2009, Interact. Comput..

[11]  Carol Friedman,et al.  Natural language processing: State of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine , 2013, J. Biomed. Informatics.

[12]  Tao Yu,et al.  Service selection algorithms for Web services with end-to-end QoS constraints , 2004, Proceedings. IEEE International Conference on e-Commerce Technology, 2004. CEC 2004..

[13]  A. O. Chiromatzo,et al.  miRNApath: a database of miRNAs, target genes and metabolic pathways. , 2007, Genetics and molecular research : GMR.

[14]  Manolis Tsiknakis,et al.  Web-Based Authoring and Secure Enactment of Bioinformatics Workflows , 2009, 2009 Workshops at the Grid and Pervasive Computing Conference.

[15]  Bruce Momjian,et al.  PostgreSQL: Introduction and Concepts , 2000 .

[16]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[17]  Stefan Rüping,et al.  Building a System for Advancing Clinico-Genomic Trials on Cancer , 2009, AIAI Workshops.

[18]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[19]  Chi-Ying F. Huang,et al.  miRTarBase: a database curates experimentally validated microRNA–target interactions , 2010, Nucleic Acids Res..

[20]  Wendy W. Chapman,et al.  ConText: An Algorithm for Identifying Contextual Features from Clinical Text , 2007, BioNLP@ACL.

[21]  Hong Yu,et al.  Automatically extracting information needs from complex clinical questions , 2010, J. Biomed. Informatics.

[22]  Terri K. Attwood,et al.  The EMBRACE web service collection , 2010, Nucleic Acids Res..

[23]  Michael W. Godfrey,et al.  Mining modern repositories with elasticsearch , 2014, MSR 2014.

[24]  Simon Bailey,et al.  Paediatric Haematology and Oncology , 2009 .

[25]  Robert Schmieder,et al.  SEQanswers: an open access community for collaboratively decoding genomes , 2012, Bioinform..

[26]  Carole A. Goble,et al.  myExperiment: a repository and social network for the sharing of bioinformatics workflows , 2010, Nucleic Acids Res..

[27]  Tiziana Margaria,et al.  Bio-jETI: a framework for semantics-based service composition , 2009, BMC Bioinformatics.

[28]  Domonkos Tikk,et al.  Research Paper: Semantic Classification of Diseases in Discharge Summaries Using a Context-aware Rule-based Classifier , 2009, J. Am. Medical Informatics Assoc..

[29]  Lilac Al-Safadi,et al.  Evaluation of Metamap Performance in Radiographic Images Retrieval , 2013 .

[30]  E. Klipp,et al.  Retrieval, alignment, and clustering of computational models based on semantic annotations , 2011, Molecular systems biology.

[31]  Hua Xu,et al.  A comparative study of current clinical natural language processing systems on handling abbreviations in discharge summaries , 2012, AMIA.

[32]  Manolis Tsiknakis,et al.  The Technologically Integrated Oncosimulator: Combining Multiscale Cancer Modeling With Information Technology in the In Silico Oncology Context , 2014, IEEE Journal of Biomedical and Health Informatics.

[33]  Benjamin E. Gross,et al.  The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. , 2012, Cancer discovery.

[34]  Mark A. Musen,et al.  The Open Biomedical Annotator , 2009, Summit on translational bioinformatics.

[35]  Carole A. Goble,et al.  BioCatalogue: a universal catalogue of web services for the life sciences , 2010, Nucleic Acids Res..

[36]  Hong Yu,et al.  AskHERMES: An online question answering system for complex clinical questions , 2011, J. Biomed. Informatics.

[37]  Dejing Dou,et al.  Ontology-based information extraction: An introduction and a survey of current approaches , 2010, J. Inf. Sci..

[38]  G. Potamias,et al.  Enhancing Web Based Services by Coupling Document Classification with User Profile , 2005, EUROCON 2005 - The International Conference on "Computer as a Tool".

[39]  Peter J. Haug,et al.  Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation , 2006, J. Biomed. Informatics.

[40]  Manolis Tsiknakis,et al.  Natural Language Processing for Biomedical Tools Discovery: A Feasibility Study and Preliminary Results , 2014, BIS.

[41]  Manolis Tsiknakis,et al.  Supporting genotype-to-phenotype association studies with grid-enabled knowledge discovery workflows , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[42]  Maria Cláudia Reis Cavalcanti,et al.  An Evaluation of Annotation Tools for Biomedical Texts , 2012, ONTOBRAS-MOST.

[43]  Manolis Tsiknakis,et al.  Evaluating Ontologies with NLP-Based Terminologies - A Case Study on ACGT and Its Master Ontology , 2010, FOIS.

[44]  Martin Reczko,et al.  DIANA miRPath v.2.0: investigating the combinatorial effect of microRNAs in pathways , 2012, Nucleic Acids Res..

[45]  Burr Settles,et al.  ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text , 2005 .

[46]  Lefteris Koumakis,et al.  Web Services Automation , 2009 .

[47]  Wendy W. Chapman,et al.  Evaluation of negation phrases in narrative clinical reports , 2001, AMIA.

[48]  David Smiley,et al.  Apache Solr 4 Enterprise Search Server , 2015 .

[49]  K. Bretonnel Cohen,et al.  Evaluation of SPARQL query generation from natural language questions , 2013, SWAIE@RANLP.

[50]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[51]  Lucila Ohno-Machado,et al.  Natural language processing: an introduction , 2011, J. Am. Medical Informatics Assoc..

[52]  N. Shah,et al.  NCBO Annotator: Semantic Annotation of Biomedical Data , 2009 .

[53]  S Wan,et al.  Clinically driven design of multi-scale cancer models: the ContraCancrum project paradigm , 2011, Interface Focus.

[54]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[55]  Stephanie Black Review: PostgreSQL: introduction and concepts , 2001 .

[56]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[57]  K. Bretonnel Cohen,et al.  Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters , 2014, BMC Bioinformatics.

[58]  Michalis E. Zervakis,et al.  Coupling Regulatory Networks and Microarays: Revealing Molecular Regulations of Breast Cancer Treatment Responses , 2012, SETN.

[59]  Erik Cambria,et al.  Towards Crowd Validation of the UK National Health Service , 2010 .

[60]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[61]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.