Simple but Effective Knowledge-Based Query Reformulations for Precision Medicine Retrieval

In Information Retrieval (IR), the semantic gap represents the mismatch between users’ queries and how retrieval models answer to these queries. In this paper, we explore how to use external knowledge resources to enhance bag-of-words representations and reduce the effect of the semantic gap between queries and documents. In this regard, we propose several simple but effective knowledge-based query expansion and reduction techniques, and we evaluate them for the medical domain. The query reformulations proposed are used to increase the probability of retrieving relevant documents through the addition to, or the removal from, the original query of highly specific terms. The experimental analyses on different test collections for Precision Medicine IR show the effectiveness of the developed techniques. In particular, a specific subset of query reformulations allow retrieval models to achieve top performing results in all the considered test collections.

[1]  Sherri de Coronado,et al.  NCI Thesaurus: A semantic model integrating cancer-related clinical and molecular information , 2007, J. Biomed. Informatics.

[2]  C E Lipscomb,et al.  Medical Subject Headings (MeSH). , 2000, Bulletin of the Medical Library Association.

[3]  Gareth J. F. Jones,et al.  Medical information retrieval: introduction to the special issue , 2016, Information Retrieval Journal.

[4]  Guido Zuccon,et al.  Information retrieval as semantic inference: a Graph Inference model applied to medical search , 2016, Information Retrieval Journal.

[5]  Feng Wang,et al.  The research of query expansion based on medical terms reweighting in medical information retrieval , 2018, EURASIP J. Wirel. Commun. Netw..

[6]  Hongfang Liu,et al.  Using large clinical corpora for query expansion in text-based cohort identification , 2014, J. Biomed. Informatics.

[7]  Padmini Srinivasan,et al.  Research Paper: Retrieval Feedback in MEDLINE , 1996, J. Am. Medical Informatics Assoc..

[8]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[9]  Jimeng Sun,et al.  Leveraging medical thesauri and physician feedback for improving medical literature retrieval for case queries , 2012, J. Am. Medical Informatics Assoc..

[10]  Kevin Donnelly,et al.  SNOMED-CT: The advanced terminology and coding system for eHealth. , 2006, Studies in health technology and informatics.

[11]  Ellen M. Voorhees,et al.  State-of-the-art in biomedical literature retrieval for clinical cases: a survey of the TREC 2014 CDS track , 2016, Information Retrieval Journal.

[12]  Michael P. Schroeder,et al.  Cancer Genome Interpreter annotates the biological and clinical relevance of tumor alterations , 2017, Genome Medicine.

[13]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[14]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[15]  Padmini Srinivasan,et al.  Query Expansion and MEDLINE , 1996, Inf. Process. Manag..