A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering

BACKGROUND AND OBJECTIVE Passage retrieval, the identification of top-ranked passages that may contain the answer for a given biomedical question, is a crucial component for any biomedical question answering (QA) system. Passage retrieval in open-domain QA is a longstanding challenge widely studied over the last decades. However, it still requires further efforts in biomedical QA. In this paper, we present a new biomedical passage retrieval method based on Stanford CoreNLP sentence/passage length, probabilistic information retrieval (IR) model and UMLS concepts. METHODS In the proposed method, we first use our document retrieval system based on PubMed search engine and UMLS similarity to retrieve relevant documents to a given biomedical question. We then take the abstracts from the retrieved documents and use Stanford CoreNLP for sentence splitter to make a set of sentences, i.e., candidate passages. Using stemmed words and UMLS concepts as features for the BM25 model, we finally compute the similarity scores between the biomedical question and each of the candidate passages and keep the N top-ranked ones. RESULTS Experimental evaluations performed on large standard datasets, provided by the BioASQ challenge, show that the proposed method achieves good performances compared with the current state-of-the-art methods. The proposed method significantly outperforms the current state-of-the-art methods by an average of 6.84% in terms of mean average precision (MAP). CONCLUSION We have proposed an efficient passage retrieval method which can be used to retrieve relevant passages in biomedical QA systems with high mean average precision.

[1]  Halil Kilicoglu,et al.  Automatically Classifying Question Types for Consumer Health Questions , 2014, AMIA.

[2]  Hong Yu,et al.  AskHERMES: An online question answering system for complex clinical questions , 2011, J. Biomed. Informatics.

[3]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[4]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[5]  Poonam Gupta,et al.  A Survey of Text Question Answering Techniques , 2012 .

[6]  Mariana L. Neves HPI Question Answering System in the BioASQ 2015 Challenge , 2015, CLEF.

[7]  Min Li,et al.  An ontology for clinical questions about the contents of patient notes , 2012, J. Biomed. Informatics.

[8]  Said Ouatik El Alaoui,et al.  A Generic Document Retrieval Framework Based on UMLS Similarity for Biomedical Question Answering System , 2016 .

[9]  Abdelmonaime Lachkar,et al.  Biomedical Question Types Classification using syntactic and rule based approach , 2015, 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K).

[10]  Chi Zhang,et al.  Learning to Answer Biomedical Factoid & List Questions: OAQA at BioASQ 3B , 2015, CLEF.

[11]  Hasso Plattner A Course in In-Memory Data Management , 2013 .

[12]  Hyoil Han,et al.  Biomedical question answering: A survey , 2010, Comput. Methods Programs Biomed..

[13]  Hong Yu,et al.  Automatically extracting information needs from complex clinical questions , 2010, J. Biomed. Informatics.

[14]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[15]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[16]  Qing Zeng-Treitler,et al.  Mining Big Data in biomedicine and health care , 2016, J. Biomed. Informatics.

[17]  Richard M. Everson,et al.  When Are Links Useful? Experiments in Text Classification , 2003, ECIR.

[18]  Sophia Ananiadou,et al.  Topic detection using paragraph vectors to support active learning in systematic reviews , 2016, J. Biomed. Informatics.

[19]  Christof Monz,et al.  Document Retrieval in the Context of Question Answering , 2003, ECIR.

[20]  Duy Duc An Bui,et al.  Extractive text summarization system to aid data extraction from full text in systematic review development , 2016, J. Biomed. Informatics.

[21]  Yanchun Zhang,et al.  The Fudan Participation in the 2015 BioASQ Challenge: Large-scale Biomedical Semantic Indexing and Question Answering , 2015, CLEF.

[22]  Yan Li,et al.  A Generic Framework for Biomedical Snippet Retrieval , 2015, 2015 3rd International Conference on Artificial Intelligence, Modelling and Simulation (AIMS).

[23]  Farid Meziane,et al.  A Methodology for Biomedical Ontology Reuse , 2016, NLDB.

[24]  Rim Faiz,et al.  A Multi-lingual Approach to Improve Passage Retrieval for Automatic Question Answering , 2016, NLDB.

[25]  Georgios Balikas,et al.  An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition , 2015, BMC Bioinformatics.

[26]  Paolo Rosso,et al.  Answering questions with an n-gram based passage retrieval engine , 2009, Journal of Intelligent Information Systems.

[27]  Michael A Bauer,et al.  Usability survey of biomedical question answering systems , 2012, Human Genomics.

[28]  Marco Muselli,et al.  Differential diagnosis of pleural mesothelioma using Logic Learning Machine , 2015, BMC Bioinformatics.

[29]  Jesse Lingeman,et al.  UMass at BioASQ 2014: Figure-inspired Text Retrieval , 2014, CLEF.

[30]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[31]  Charles L. A. Clarke,et al.  The effect of document retrieval quality on factoid question answering performance , 2004, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

[32]  Clement T. Yu,et al.  Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature , 2007, SIGIR.

[33]  Charles L. A. Clarke,et al.  Passage retrieval vs. document retrieval for factoid question answering , 2003, SIGIR.

[34]  Ted Pedersen,et al.  UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity , 2009, AMIA.

[35]  Hyunki Kim,et al.  Open domain question answering using Wikipedia-based knowledge model , 2014, Inf. Process. Manag..

[36]  George Hripcsak,et al.  Development, implementation, and a cognitive evaluation of a definitional question answering system for physicians , 2007, J. Biomed. Informatics.

[37]  Pascal Poncelet,et al.  Enhancing passage retrieval in log files by query expansion based on explicit and pseudo relevance feedback , 2014, Comput. Ind..

[38]  Cécile Paris,et al.  Query-oriented evidence extraction to support evidence-based medicine practice , 2016, J. Biomed. Informatics.

[39]  Hongfei Lin,et al.  Passage retrieval based hidden knowledge discovery from biomedical literature , 2011, Expert Syst. Appl..

[40]  Hajime Morita,et al.  Question answering system using Q & A site corpus Query expansion and answer candidate evaluation , 2013, SpringerPlus.

[41]  Manoj Kumar Chinnakotla,et al.  IIITH at BioASQ Challange 2015 Task 3b: Bio-Medical Question Answering System , 2015, CLEF.

[42]  Dejan Dinevski,et al.  Biomedical question answering using semantic relations , 2015, BMC Bioinformatics.

[43]  Dympna O'Sullivan,et al.  Is There a Consensus when Physicians Evaluate the Relevance of Retrieved Systematic Reviews? , 2016, Methods of Information in Medicine.

[44]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[45]  Dragomir R. Radev,et al.  Biased LexRank: Passage retrieval using random walks with question-based priors , 2009, Inf. Process. Manag..

[46]  Ulf Leser,et al.  Question answering for biology. , 2015, Methods.

[47]  Hasso Plattner,et al.  A Course in In-Memory Data Management: The Inner Mechanics of In-Memory Databases , 2013 .

[48]  Pierre Zweigenbaum,et al.  MEANS: A medical question-answering system combining NLP techniques and semantic Web technologies , 2015, Inf. Process. Manag..

[49]  Kalpana Raja,et al.  Classification of clinically useful sentences in clinical evidence resources , 2016, J. Biomed. Informatics.

[50]  Pankaj Gupta,et al.  Fuzzy COTS Selection for Modular Software Systems Based on Cohesion and Coupling under Multiple Applications Environment , 2012, Int. J. Appl. Evol. Comput..