Biomedical question answering using semantic relations

BackgroundThe proliferation of the scientific literature in the field of biomedicine makes it difficult to keep abreast of current knowledge, even for domain experts. While general Web search engines and specialized information retrieval (IR) systems have made important strides in recent decades, the problem of accurate knowledge extraction from the biomedical literature is far from solved. Classical IR systems usually return a list of documents that have to be read by the user to extract relevant information. This tedious and time-consuming work can be lessened with automatic Question Answering (QA) systems, which aim to provide users with direct and precise answers to their questions. In this work we propose a novel methodology for QA based on semantic relations extracted from the biomedical literature.ResultsWe extracted semantic relations with the SemRep natural language processing system from 122,421,765 sentences, which came from 21,014,382 MEDLINE citations (i.e., the complete MEDLINE distribution up to the end of 2012). A total of 58,879,300 semantic relation instances were extracted and organized in a relational database. The QA process is implemented as a search in this database, which is accessed through a Web-based application, called SemBT (available at http://sembt.mf.uni-lj.si). We conducted an extensive evaluation of the proposed methodology in order to estimate the accuracy of extracting a particular semantic relation from a particular sentence. Evaluation was performed by 80 domain experts. In total 7,510 semantic relation instances belonging to 2,675 distinct relations were evaluated 12,083 times. The instances were evaluated as correct 8,228 times (68%).ConclusionsIn this work we propose an innovative methodology for biomedical QA. The system is implemented as a Web-based application that is able to provide precise answers to a wide range of questions. A typical question is answered within a few seconds. The tool has some extensions that make it especially useful for interpretation of DNA microarray results.

[1]  A. Valencia,et al.  Mining functional information associated with expression arrays , 2001, Functional & Integrative Genomics.

[2]  Hyoil Han,et al.  Biomedical question answering: A survey , 2010, Comput. Methods Programs Biomed..

[3]  Yifeng Liu,et al.  Question Answering for Biomedicine , 2016 .

[4]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[5]  Diego Molla Aliod,et al.  Question Answering in Restricted Domains: An Overview , 2007, CL.

[6]  Min Feng,et al.  Automatic Clinical Question Answering Based on UMLS Relations , 2007, Third International Conference on Semantics, Knowledge and Grid (SKG 2007).

[7]  Hong Yu,et al.  AskHERMES: An online question answering system for complex clinical questions , 2011, J. Biomed. Informatics.

[8]  Enrico Motta,et al.  Evaluating question answering over linked data , 2013, J. Web Semant..

[9]  Graeme Hirst,et al.  Answering Clinical Questions with Role Identification , 2003, BioNLP@ACL.

[10]  Halil Kilicoglu,et al.  Argument-predicate distance as a filter for enhancing precision in extracting predications on the genetic etiology of disease , 2006, BMC Bioinformatics.

[11]  Lawrence Hunter,et al.  Biomedical Discovery Acceleration, with Applications to Craniofacial Development , 2009, PLoS Comput. Biol..

[12]  J. Clarke,et al.  Medicine , 1907, Bristol medico-chirurgical journal.

[13]  Hagit Shatkay,et al.  Genes, Themes, and Microarrays: Using Information Retrieval for Large-Scale Gene Analysis , 2000, ISMB.

[14]  Bogdan Sacaleanu,et al.  Overview of the CLEF 2008 Multilingual Question Answering Track , 2008, CLEF.

[15]  Marcelo Fiszman,et al.  Extracting Semantic Predications from Medline Citations for Pharmacogenomics , 2006, Pacific Symposium on Biocomputing.

[16]  W. Hersh,et al.  Factors associated with successful answering of clinical questions using an information retrieval system. , 2002, Bulletin of the Medical Library Association.

[17]  M. Ebell,et al.  Analysis of questions asked by family doctors regarding patient care , 1999, BMJ.

[18]  Allen C. Browne,et al.  Lexical methods for managing variation in biomedical terminologies. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[19]  David L. Sackett,et al.  Evidence based medicine: What it is and what it isn't (reprinted from BMJ, vol 312, pg 71-72, 1996) , 2007 .

[20]  William R. Hersh,et al.  Automatic Summarization of Mouse Gene Information by Clustering and Sentence Extraction from MEDLINE Abstracts , 2007, AMIA.

[21]  Jimmy J. Lin,et al.  Answering Clinical Questions with Knowledge-Based and Statistical Techniques , 2007, CL.

[22]  D. Sackett,et al.  Evidence based medicine: what it is and what it isn't , 1996, BMJ.

[23]  B L Humphreys,et al.  The UMLS project: making the conceptual connection between users and the information they need. , 1993, Bulletin of the Medical Library Association.

[24]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[25]  Marco Botta,et al.  Microarray data analysis and mining approaches. , 2008, Briefings in functional genomics & proteomics.

[26]  Thomas C. Rindflesch,et al.  MedPost: a part-of-speech tagger for bioMedical text , 2004, Bioinform..

[27]  Ellen M. Voorhees,et al.  TREC genomics special issue overview , 2009, Information Retrieval.

[28]  Richard Smith What clinical information do doctors need? , 1996, BMJ.

[29]  Pierre Zweigenbaum,et al.  Towards a Medical Question-Answering System: a Feasibility Study , 2003, MIE.

[30]  Noriko Kando Overview of the Fifth NTCIR Workshop , 2005, NTCIR.

[31]  Jonathan D. Wren,et al.  Clustering microarray-derived gene lists through implicit literature relationships , 2007, Bioinform..

[32]  Lynette Hirschman,et al.  Natural language question answering: the view from here , 2001, Natural Language Engineering.

[33]  Peter A. C. 't Hoen,et al.  Literature-aided meta-analysis of microarray data: a compendium study on muscle development and disease , 2008, BMC Bioinformatics.

[34]  Charles P. Friedman,et al.  Research Paper: Factors Associated with Success in Searching MEDLINE and Applying Evidence to Answer Clinical Questions , 2002, J. Am. Medical Informatics Assoc..