Evaluation of Vector Space Models for Medical Disorders Information Retrieval

Nowadays, consumers often search online to seek medical and health care information that they need. To improve this access, the ShARe/CLEF eHealth Evaluation Lab (SHEL) organized a shared task on information retrieval for Medical Disorders in 2013. This paper describes our participation in this task. In order to detect latent semantic relevance between queries and webpages about disorders, a semantic vector model based on distributional semantics is used as the information retrieval model. Specifically, variants of random indexing are employed to generate document and term representations. In addition, to reduce the lexical lap between different clinical expressions of the same concept, query expansion is also conducted using the UMLS. A baseline information retrieval method using the vector space model (VSM) and semantic vector models with different random indexing building procedures were developed and evaluated with or without query expansion in the shared task. The best performance was achieved by VSM, with MAP of 0.1480, P@10 of 0.3700 and nDCG@10 of 0.3363. Experimental results indicate that VSM and semantic vector model are complementary, and suggest combining these methods may further improve performance.

[1]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[2]  Dominic Widdows,et al.  Semantic Vectors: a Scalable Open Source Package and Online Technology Management Application , 2008, LREC.

[3]  Shuang-Hong Yang,et al.  Dialect topic modeling for improved consumer medical search. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[4]  Trevor Cohen,et al.  Empirical distributional semantics: Methods and biomedical applications , 2009, J. Biomed. Informatics.

[5]  Jimmy J. Lin,et al.  PubMed related articles: a probabilistic topic-based model for content similarity , 2007, BMC Bioinformatics.

[6]  Gang Luo Lessons learned from building the iMED intelligent medical search engine , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[7]  Eugene Santos,et al.  Empirical Evaluation of Adaptive User Modeling in a Medical Information Retrieval Application , 2003, User Modeling.

[8]  Magnus Sahlgren,et al.  An Introduction to Random Indexing , 2005 .

[9]  Lior Rokach,et al.  Context-Sensitive Medical Information Retrieval , 2004, MedInfo.

[10]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[11]  Trevor Cohen,et al.  The Semantic Vectors Package: New Algorithms and Public Tools for Distributional Semantics , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.

[12]  Miguel Ángel García Cumbreras,et al.  Integrating MeSH Ontology to Improve Medical Information Retrieval , 2007, CLEF.

[13]  Thomas C. Rindflesch,et al.  Query Expansion Using the UMLS ® Metathesaurus ® , 1997 .

[14]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[15]  Robert M Plovnick,et al.  Reformulation of Consumer Health Queries with Professional Terminology: A Pilot Study , 2004, Journal of medical Internet research.

[16]  Sanna Salanterä,et al.  Overview of the ShARe/CLEF eHealth Evaluation Lab 2013 , 2013, CLEF.

[17]  Yan Zhang,et al.  Health information searching behavior in MedlinePlus and the impact of tasks , 2012, IHI '12.

[18]  Yan Zhang,et al.  Searching : an Analysis of Questions in a Social Q & A Community , 2010 .

[19]  Magnus Sahlgren,et al.  Vector-based semantic analysis: representing word meanings based on random labels , 2001 .

[20]  Trevor Cohen,et al.  Reflective Random Indexing and indirect inference: A scalable method for discovery of implicit connections , 2010, J. Biomed. Informatics.

[21]  Anders Holst,et al.  Random indexing of text samples for latent semantic analysis , 2000 .

[22]  Peter Frankl,et al.  The Johnson-Lindenstrauss lemma and the sphericity of some graphs , 1987, J. Comb. Theory B.

[23]  Stephen E. Robertson,et al.  Experimentation as a way of life: Okapi at TREC , 2000, Inf. Process. Manag..

[24]  Hao Yang,et al.  MedSearch: a specialized search engine for medical information retrieval , 2008, CIKM '08.

[25]  Linda A. Watson,et al.  Information Retrieval: A Health and Biomedical Perspective. , 2005 .

[26]  Haim Levkowitz,et al.  Introduction to information retrieval (IR) , 2008 .

[27]  Ted Pedersen,et al.  UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity , 2009, AMIA.

[28]  Qing Zeng-Treitler,et al.  Research Paper: Assisting Consumer Health Information Retrieval with Query Recommendations , 2006, J. Am. Medical Informatics Assoc..

[29]  Hsinchun Chen,et al.  Exploring the use of concept spaces to improve medical information retrieval , 2000, Decis. Support Syst..