An adaptive term proximity based rocchio’s model for clinical decision support retrieval

In order to better help doctors make decision in the clinical setting, research is necessary to connect electronic health record (EHR) with the biomedical literature. Pseudo Relevance Feedback (PRF) is a kind of classical query modification technique that has shown to be effective in many retrieval models and thus suitable for handling terse language and clinical jargons in EHR. Previous work has introduced a set of constraints (axioms) of traditional PRF model. However, in the feedback document, the importance degree of candidate term and the co-occurrence relationship between a candidate term and a query term. Most methods do not consider both of these factors. Intuitively, terms that have higher co-occurrence degree with a query term are more likely to be related to the query topic. In this paper, we incorporate original HAL model into the Rocchio’s model, and propose a new concept of term proximity feedback weight. A HAL-based Rocchio’s model in the query expansion, called HRoc, is proposed. Meanwhile, we design three normalization methods to better incorporate proximity information to query expansion. Finally, we introduce an adaptive parameter to replace the length of sliding window of HAL model, and it can select window size according to document length. Based on 2016 TREC Clinical Support medicine dataset, experimental results demonstrate that the proposed HRoc and HRoc_AP models superior to other advanced models, such as PRoc2 and TF-PRF methods on various evaluation metrics. Among them, compared with the Proc2 and TF-PRF models, the MAP of our model is increased by 8.5% and 12.24% respectively, while the F1 score of our model is increased by 7.86% and 9.88% respectively. The proposed HRoc model can effectively enhance the precision and the recall rate of Information Retrieval and gets a more precise result than other models. Furthermore, after introducing self-adaptive parameter, the advanced HRoc_AP model uses less hyper-parameters than other models while enjoys an equivalent performance, which greatly improves the efficiency and applicability of the model and thus helps clinicians to retrieve clinical support document effectively.

[1]  Qinghe Du,et al.  A study on query terms proximity embedding for information retrieval , 2017, Int. J. Distributed Sens. Networks.

[2]  Ben He,et al.  Modeling term proximity for probabilistic information retrieval models , 2011, Inf. Sci..

[3]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[4]  ChengXiang Zhai,et al.  A comparative study of methods for estimating query language models with pseudo feedback , 2009, CIKM.

[5]  Mohand Boughanem,et al.  Pseudo-Relevance Feedback Method based on the Cross Product of Irrelevant Documents , 2017, IJWA.

[6]  Yang Song,et al.  ECNU at TREC 2016: Web-based query expansion and experts diagnosis in Medical Information Retrieval , 2016, TREC.

[7]  Douglas L. T. Rohde,et al.  An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence , 2005 .

[8]  Zhiyong Lu,et al.  Deep Learning for Biomedical Information Retrieval: Learning Textual Relevance from Click Logs , 2017, BioNLP.

[9]  Ellen M. Voorhees,et al.  TREC genomics special issue overview , 2009, Information Retrieval.

[10]  Xiangji Huang,et al.  A simple term frequency transformation model for effective pseudo relevance feedback , 2014, SIGIR.

[11]  Xiangji Huang,et al.  Modeling Term Associations for Probabilistic Information Retrieval , 2014, TOIS.

[12]  Ellen M. Voorhees,et al.  Overview of the TREC 2020 Precision Medicine Track , 2017, TREC.

[13]  Hongfang Liu,et al.  BELTracker: evidence sentence retrieval for BEL statements , 2016, Database J. Biol. Databases Curation.

[14]  Charles L. A. Clarke,et al.  Relevance ranking for one to three term queries , 1997, Inf. Process. Manag..

[15]  Iadh Ounis,et al.  Incorporating term dependency in the dfr framework , 2007, SIGIR.

[16]  Paolo Napoletano,et al.  Improving relevance feedback‐based query expansion by the use of a weighted word pairs approach , 2015, J. Assoc. Inf. Sci. Technol..

[17]  Jintao Li,et al.  Improved latent concept expansion using hierarchical markov random fields , 2010, CIKM.

[18]  F. Gargouri,et al.  The Impact of Term Statistical Relationships on Rocchio ’ s Model Parameters For Pseudo Relevance Feedback , 2016 .

[19]  Yue Zhang,et al.  An Enhanced HAL-Based Pseudo Relevance Feedback Model in Clinical Decision Support Retrieval , 2018, ICIC.

[20]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[21]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[22]  Yong Yu,et al.  Viewing Term Proximity from a Different Perspective , 2008, ECIR.

[23]  Ellen M. Voorhees,et al.  Overview of the TREC 2012 Medical Records Track , 2012, TREC.

[24]  ChengXiang Zhai,et al.  Axiomatic Analysis of Smoothing Methods in Language Models for Pseudo-Relevance Feedback , 2015, ICTIR.

[25]  Ben He,et al.  CRTER: using cross terms to enhance probabilistic information retrieval , 2011, SIGIR '11.

[26]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[27]  ChengXiang Zhai,et al.  Positional relevance model for pseudo-relevance feedback , 2010, SIGIR.

[28]  Ying Wang,et al.  A study of the effect of term proximity on query expansion , 2006, J. Inf. Sci..

[29]  Xiangji Huang,et al.  Proximity-based rocchio's model for pseudo relevance , 2012, SIGIR '12.

[30]  Jacques Savoy,et al.  Term Proximity Scoring for Keyword-Based Retrieval Systems , 2003, ECIR.

[31]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[32]  Jianyong Sun,et al.  Using deep learning for content-based medical image retrieval , 2017, Medical Imaging.

[33]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[34]  Henning Müller,et al.  Overview of the ImageCLEF 2013 Medical Tasks , 2013, CLEF.

[35]  Ellen M. Voorhees,et al.  Overview of the TREC 2014 Clinical Decision Support Track , 2014, TREC.

[36]  ChengXiang Zhai,et al.  Positional language models for information retrieval , 2009, SIGIR.

[37]  Xiangji Huang,et al.  A learning to rank approach for quality‐aware pseudo‐relevance feedback , 2016, J. Assoc. Inf. Sci. Technol..

[38]  Azadeh Shakery,et al.  Pseudo-Relevance Feedback Based on Matrix Factorization , 2016, CIKM.

[39]  Sujoy Das,et al.  Query Expansion Strategy based on Pseudo Relevance Feedback and Term Weight Scheme for Monolingual Retrieval , 2014, ArXiv.

[40]  Hongfang Liu,et al.  An Ensemble Model of Clinical Information Extraction and Information Retrieval for Clinical Decision Support , 2016, TREC.

[41]  Charles L. A. Clarke,et al.  Term proximity scoring for ad-hoc retrieval on very large text collections , 2006, SIGIR.