Knowledge-based query expansion to support scenario-specific retrieval of medical free text

In retrieving medical free text, users are often interested in answers pertinent to certain scenarios that correspond to common tasks performed in medical practice, e.g., treatment or diagnosis of a disease. A major challenge in handling such queries is that scenario terms in the query (e.g., treatment) are often too general to match specialized terms in relevant documents (e.g., chemotherapy). In this paper, we propose a knowledge-based query expansion method that exploits the UMLS knowledge source to append the original query with additional terms that are specifically relevant to the query's scenario(s). We compared the proposed method with traditional statistical expansion that expands terms which are statistically correlated but not necessarily scenario specific. Our study on two standard testbeds shows that the knowledge-based method, by providing scenario-specific expansion, yields notable improvements over the statistical method in terms of average precision-recall. On the OHSUMED testbed, for example, the improvement is more than 5% averaging over all scenario-specific queries studied and about 10% for queries that mention certain scenarios, such as treatment of a disease and differential diagnosis of a symptom/disease.

[1]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[2]  M. Ebell,et al.  Analysis of questions asked by family doctors regarding patient care , 1999, BMJ.

[3]  W. Bruce Croft,et al.  The INQUERY Retrieval System , 1992, DEXA.

[4]  R. Brian Haynes,et al.  Enhancing Retrieval of Best Evidence for Health Care from Bibliographic Databases: Calibration of the Hand Search of the Literature , 2001, MedInfo.

[5]  Padmini Srinivasan,et al.  Query Expansion and MEDLINE , 1996, Inf. Process. Manag..

[6]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[7]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[8]  William R. Hersh,et al.  A task-oriented approach to information retrieval evaluation , 1996 .

[9]  P. Gorman,et al.  A taxonomy of generic clinical questions: classification study , 2000, BMJ : British Medical Journal.

[10]  Ellen M. Voorhees,et al.  On Expanding Query Vectors with Lexically Related Words , 1993, TREC.

[11]  K. A. McKibbon,et al.  Online access to medline in clinical settings , 2020 .

[12]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[13]  R Brian Haynes,et al.  BMC Medicine BioMed Central , 2003 .

[14]  R A Greenes,et al.  Characteristics of Consumer Terminology for Health Information Retrieval , 2002, Methods of Information in Medicine.

[15]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[16]  D. Covell,et al.  Information needs in office practice: are they being met? , 1985, Annals of internal medicine.

[17]  P. Gorman,et al.  Can primary care physicians' questions be answered using the medical journal literature? , 1994, Bulletin of the Medical Library Association.

[18]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[19]  Zhenyu Liu,et al.  Knowledge-based query expansion to support scenario-specific retrieval of medical free text , 2005, SAC '05.

[20]  Robert B. K. Dewar,et al.  Indirect threaded code , 1975, Commun. ACM.

[21]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[22]  L. Brooke The National Library of Medicine. , 1980, Hospital libraries.

[23]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[24]  W. Bruce Croft,et al.  An Association Thesaurus for Information Retrieval , 1994, RIAO.

[25]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[26]  Chris Buckley,et al.  New Retrieval Approaches Using SMART: TREC 4 , 1995, TREC.

[27]  R. Brian Haynes,et al.  Developing Optimal Search Strategies for Detecting Clinically Sound Causation Studies in MEDLINE , 2003, AMIA.

[28]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[29]  Dagobert Soergel,et al.  Exploring Medical Expressions Used by Consumers and the Media: An Emerging View of Consumer Health Vocabularies , 2003, AMIA.

[30]  Robert M Plovnick,et al.  Reformulation of Consumer Health Queries with Professional Terminology: A Pilot Study , 2004, Journal of medical Internet research.

[31]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[32]  P N Gorman,et al.  Information Seeking in Primary Care , 1995, Medical decision making : an international journal of the Society for Medical Decision Making.

[33]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[34]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[35]  Robert J. Gaizauskas,et al.  Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms , 2004, TREC.

[36]  K. A. McKibbon,et al.  Online access to MEDLINE in clinical settings. A study of use and usefulness. , 1990, Annals of internal medicine.

[37]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[38]  Craig A. Morioka,et al.  IndexFinder: A Method of Extracting Key Concepts from Clinical Texts for Indexing , 2003, AMIA.

[39]  R. Brian Haynes,et al.  Developing Optimal Search Strategies for Detecting Sound Clinical Prediction Studies in MEDLINE , 2003, AMIA.

[40]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[41]  Thomas C. Rindflesch,et al.  Query Expansion Using the UMLS ® Metathesaurus ® , 1997 .

[42]  William R. Hersh,et al.  Assessing thesaurus-based query expansion using the UMLS Metathesaurus , 2000, AMIA.

[43]  Efthimis N. Efthimiadis,et al.  UCLA-Okapi at TREC-2: Query Expansion Experiments , 1993, TREC.