Synonym, Topic Model and Predicate-Based Query Expansion for Retrieving Clinical Documents

We present a study that developed and tested three query expansion methods for the retrieval of clinical documents. Finding relevant documents in a large clinical data warehouse is a challenging task. To address this issue, first, we implemented a synonym expansion strategy that used a few selected vocabularies. Second, we trained a topic model on a large set of clinical documents, which was then used to identify related terms for query expansion. Third, we obtained related terms from a large predicate database derived from Medline abstracts for query expansion. The three expansion methods were tested on a set of clinical notes. All three methods successfully achieved higher average recalls and average F-measures when compared with the baseline method. The average precisions and precision at 10, however, decreased with all expansions. Amongst the three expansion methods, the topic model-based method performed the best in terms of recall and F-measure.

[1]  Jun Gao,et al.  DW4TR: A Data Warehouse for Translational Research , 2011, J. Biomed. Informatics.

[2]  S. Ellis,et al.  Standardisation of a procedure for quantifying surface antigens by indirect immunofluorescence. , 1999, Journal of immunological methods.

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  Thomas C. Rindflesch,et al.  Query Expansion Using the UMLS ® Metathesaurus ® , 1997 .

[5]  William R. Hersh,et al.  Assessing thesaurus-based query expansion using the UMLS Metathesaurus , 2000, AMIA.

[6]  Cynthia Brandt,et al.  Temporal query of attribute-value patient data: utilizing the constraints of clinical studies , 2003, Int. J. Medical Informatics.

[7]  A. Steven Pollitt,et al.  CANSEARCH: An expert systems approach to document retrieval , 1987, Inf. Process. Manag..

[8]  Stéfan Jacques Darmoni,et al.  Performance evaluation of unified medical language system®'s synonyms expansion to query PubMed , 2012, BMC Medical Informatics and Decision Making.

[9]  Johan Gustav Bellika,et al.  Properties of a federated epidemiology query system , 2007, Int. J. Medical Informatics.

[10]  Lei Yang,et al.  Query log analysis of an electronic health record search engine. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[11]  Daniel M. Stein,et al.  An analysis of clinical queries in an electronic health record search utility , 2010, Int. J. Medical Informatics.

[12]  Dolf Trieschnigg,et al.  DutchHatTrick: Semantic Query Modeling, ConText, Section Detection, and Match Score Maximization , 2011, TREC.

[13]  Henry C. Chueh,et al.  Optimizing healthcare research data warehouse design through past COSTAR query analysis , 1999, AMIA.

[14]  Padmini Srinivasan,et al.  Research Paper: Retrieval Feedback in MEDLINE , 1996, J. Am. Medical Informatics Assoc..

[15]  Steve Evans,et al.  The DEDUCE Guided Query tool: Providing simplified access to clinical data for research and quality improvement , 2011, J. Biomed. Informatics.

[16]  Hideki Mima,et al.  Terminology-driven literature mining and knowledge acquisition in biomedicine , 2002, Int. J. Medical Informatics.

[17]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  W Gall,et al.  Clinical Data Retrieval: 25 Years of Temporal Query Management at the University of Vienna Medical School , 2002, Methods of Information in Medicine.

[19]  Xiangji Huang,et al.  York University at TREC 2011: Medical Records Track , 2011, TREC.

[20]  David A. Hanauer,et al.  EMERSE: The Electronic Medical Record Search Engine , 2006, AMIA.

[21]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[22]  Thusitha De Silva Mabotuwana,et al.  An ontology-based approach to enhance querying capabilities of general practice medicine for better management of hypertension , 2009, Artif. Intell. Medicine.

[23]  Yuval Shahar,et al.  Intelligent Interactive Visual Exploration of Temporal Associations among Multiple Time-oriented Patient Records , 2009, Methods of Information in Medicine.

[24]  Lynda Tamine,et al.  IRIT at TREC 2011: Evaluation of Query Expansion Techniques for Medical Record Retrieval , 2011, TREC.

[25]  Tamas E. Doszkocs,et al.  AID, an Associative Interactive Dictionary for online searching , 1978 .

[26]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[27]  Stefan Schulz,et al.  Evaluation of a Document Search Engine in a Clinical Department System , 2008, AMIA.