Modelling Relevance towards Multiple Inclusion Criteria when Ranking Patients.

In the medical domain, information retrieval systems can be used for identifying cohorts (i.e. patients) required for clinical studies. However, a challenge faced by such search systems is to retrieve the cohorts whose medical histories cover the inclusion criteria specified in a query, which are often complex and include multiple medical conditions. For example, a query may aim to find patients with both 'lupus nephritis' and 'thrombotic thrombocytopenic purpura'. In a typical best-match retrieval setting, any patient exhibiting all of the inclusion criteria should naturally be ranked higher than a patient that only exhibits a subset, or none, of the criteria. In this work, we extend the two main existing models for ranking patients to take into account the coverage of the inclusion criteria by adapting techniques from recent research into coverage-based diversification. We propose a novel approach for modelling the coverage of the query inclusion criteria within the records of a particular patient, and thereby rank highly those patients whose medical records are likely to cover all of the specified criteria. In particular, our proposed approach estimates the relevance of a patient, based on the mixture of the probability that the patient is retrieved by a patient ranking model for a given query, and the likelihood that the patient's records cover the query criteria. The latter is measured using the relevance towards each of the criteria stated in the query, represented in the form of sub-queries. We thoroughly evaluate our proposed approach using the test collection provided by the TREC 2011 and 2012 Medical Records track. Our results show significant improvements over existing strong baselines.

[1]  Ellen M. Voorhees,et al.  Overview of the TREC 2012 Medical Records Track , 2012, TREC.

[2]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[3]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[4]  Craig MacDonald,et al.  Voting for candidates: adapting data fusion techniques for an expert search task , 2006, CIKM '06.

[5]  Lijun Wang,et al.  Cengage Learning at TREC 2011 Medical Track , 2011, TREC.

[6]  Cynthia Brandt,et al.  Semantic similarity in the biomedical domain: an evaluation across knowledge sources , 2012, BMC Bioinformatics.

[7]  Ben Carterette,et al.  Exploring Evidence Aggregation Methods and External Expansion Sources for Medical Record Search , 2012, TREC.

[8]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[9]  Stephen Tyree,et al.  Parallel boosted regression trees for web search ranking , 2011, WWW.

[10]  Berthier A. Ribeiro-Neto,et al.  A belief network model for IR , 1996, SIGIR '96.

[11]  Craig MacDonald,et al.  Selectively diversifying web search results , 2010, CIKM.

[12]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[13]  Craig MacDonald,et al.  Terrier Information Retrieval Platform , 2005, ECIR.

[14]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[15]  Falk Scholer,et al.  Effective Pre-retrieval Query Performance Prediction Using Similarity and Variability Evidence , 2008, ECIR.

[16]  Craig MacDonald,et al.  Exploiting term dependence while handling negation in medical search , 2012, SIGIR '12.

[17]  Yanjun Qi,et al.  Retrieving Medical Records with "sennamed": NEC Labs America at TREC 2012 Medical Record Track , 2012, TREC.

[18]  Giorgio Gambosi,et al.  FUB, IASI-CNR and University of Tor Vergata at TREC 2008 Blog Track , 2008, TREC.

[19]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[20]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[21]  Iadh Ounis,et al.  Query performance prediction , 2006, Inf. Syst..

[22]  W. Bruce Croft,et al.  Combining the language model and inference network approaches to retrieval , 2004, Inf. Process. Manag..

[23]  William R. Hersh,et al.  Barriers to Retrieving Patient Information from Electronic Health Record Data: Failure Analysis from the TREC Medical Records Track , 2012, AMIA.

[24]  Craig MacDonald,et al.  Inferring conceptual relationships to improve medical records search , 2013, OAIR.

[25]  Craig MacDonald,et al.  University of Glasgow at Medical Records Track: Experiments with Terrier , 2011, TREC.

[26]  George Hripcsak,et al.  Caveats for the use of operational electronic health record data in comparative effectiveness research. , 2013, Medical care.

[27]  Wei Zheng,et al.  Query Aspect Based Term Weighting Regularization in Information Retrieval , 2010, ECIR.

[28]  Elad Yom-Tov,et al.  Estimating the query difficulty for information retrieval , 2010, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[29]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[30]  Craig MacDonald,et al.  Learning to selectively rank patients' medical history , 2013, CIKM.

[31]  Ben He,et al.  Terrier : A High Performance and Scalable Information Retrieval Platform , 2022 .

[32]  Antonio Jimeno-Yepes,et al.  A Knowledge-Based Approach to Medical Records Retrieval , 2011, TREC.

[33]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[34]  Craig MacDonald,et al.  A Task-Specific Query and Document Representation for Medical Records Search , 2013, ECIR.

[35]  Cristina V. Lopes,et al.  Bagging gradient-boosted trees for high precision, low variance ranking models , 2011, SIGIR.