Exploring Query Expansion for Entity Searches in PubMed

Identifying relevant studies from the entire scientific literature is an important task in biomedical research. Past efforts have incorporated semantically recognized biological entities and medical ontologies into biomedical literature search. However, semantic relations are largely overlooked by biomedical search engines. In this work, we aim to discover synonymous biomedical semantic relations between entities and explore their uses in query (semantics) understanding for improved retrieval performance. Specifically, we discover synonymous semantic relations from PubMed queries and apply them to query expansion and specification. In these two real-world scenarios, better PubMed retrieval effectiveness, in terms of recall and precision, can be achieved, demonstrating the utility of our proposed approach.

[1]  Zhiyong Lu,et al.  tmChem: a high performance approach for chemical named entity recognition and normalization , 2015, Journal of Cheminformatics.

[2]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[3]  Robert B. Allen,et al.  Using UMLS-based Re-Weighting Terms as a Query Expansion Strategy , 2006, 2006 IEEE International Conference on Granular Computing.

[4]  Zhiyong Lu,et al.  Semi-automatic semantic annotation of PubMed queries: A study on quality, efficiency, satisfaction , 2011, J. Biomed. Informatics.

[5]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[6]  V. Jalali,et al.  The effect of using domain specific ontologies in query expansion in medical field , 2008, 2008 International Conference on Innovations in Information Technology.

[7]  Xuheng Xu,et al.  Cluster-based query expansion using language modeling in the biomedical domain , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[8]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[9]  Thomas C. Rindflesch,et al.  Query Expansion Using the UMLS ® Metathesaurus ® , 1997 .

[10]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[11]  Massimo Melucci,et al.  Symbol-Based Query Expansion Experiments at TREC 2005 Genomics Track , 2005, TREC.

[12]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[13]  Tamas E. Doszkocs,et al.  AID, an Associative Interactive Dictionary for online searching , 1978 .

[14]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[15]  Kotagiri Ramamohanarao,et al.  Query Expansion Using a Collection Dependent Probabilistic Latent Semantic Thesaurus , 2007, PAKDD.

[16]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[17]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[18]  Zhiyong Lu,et al.  Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II , 2012, Database J. Biol. Databases Curation.

[19]  Chuleerat Jaruskulchai,et al.  Query Expansion Using Medical Subject Headings Terms in the Biomedical Documents , 2014, ACIIDS.

[20]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[21]  Ron Sacks-Davis,et al.  Similarity Measures for Short Queries , 1995, TREC.

[22]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[23]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[24]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[25]  Luis Alfonso Ureña López,et al.  Query expansion with a medical ontology to improve a multimodal information retrieval system , 2009, Comput. Biol. Medicine.

[26]  Fleur Mougin,et al.  Query Expansion using External Resources for Improving Information Retrieval in the Biomedical Domain , 2014, CLEF.

[27]  Hsin-Hsi Chen,et al.  Novel Association Measures Using Web Search with Double Checking , 2006, ACL.

[28]  A R Aronson,et al.  The effect of textual variation on concept based information retrieval. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[29]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[30]  Jianying Wang,et al.  A corpus analysis approach for automatic query expansion and its extension to multiple databases , 1999, TOIS.

[31]  Zhiyong Lu,et al.  Finding Query Suggestions for PubMed , 2009, AMIA.

[32]  Zhiyong Lu,et al.  Evaluation of query expansion using MeSH in PubMed , 2009, Information Retrieval.

[33]  Wei-Ying Ma,et al.  Query Expansion by Mining User Logs , 2003, IEEE Trans. Knowl. Data Eng..

[34]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[35]  Yi Liu,et al.  Statistical Machine Translation for Query Expansion in Answer Retrieval , 2007, ACL.

[36]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[37]  Zhiyong Lu,et al.  Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation , 2016, Database J. Biol. Databases Curation.

[38]  Betsy L. Humphreys,et al.  Relationships in Medical Subject Headings (MeSH) , 2001 .

[39]  Jun'ichi Tsujii,et al.  Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data , 2005, HLT.

[40]  Zhiyong Lu,et al.  Bridging the Gap: a Semantic Similarity Measure between Queries and Documents , 2016, ArXiv.

[41]  Carolyn J. Crouch,et al.  Experiments in automatic statistical thesaurus construction , 1992, SIGIR '92.

[42]  Zhiyong Lu,et al.  GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains , 2015, BioMed research international.

[43]  Yifan Peng,et al.  Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task , 2016, Database J. Biol. Databases Curation.

[44]  Rong Xu,et al.  Automatic construction of a large-scale and accurate drug-side-effect association knowledge base from biomedical literature , 2014, J. Biomed. Informatics.

[45]  Padmini Srinivasan,et al.  Query Expansion and MEDLINE , 1996, Inf. Process. Manag..

[46]  S. Fields,et al.  Protein-protein interactions: methods for detection and analysis , 1995, Microbiological reviews.

[47]  Zhiyong Lu,et al.  DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[48]  Sung-Hyon Myaeng,et al.  TIPSTER Panel - DR-LINK's Linguistic-Conceptual Approach to Document Detection , 1992, TREC.

[49]  Zhiyong Lu,et al.  Understanding PubMed® user search behavior through log analysis , 2009, Database J. Biol. Databases Curation.