RePaLi Participation to CLEF eHealth IR Challenge 2014: Leveraging Term Variation

This paper describes the participation of RePaLi, a team composed with members of IRISA, LIMSI and STL, to the biomedical information retrieval challenge proposed in the framework of CLEF eHealth. For this first participation, our approach relies on a state-of-the-art IR system called Indri, based on statistical language modeling, and on semantic resources. The purpose of semantic resources and methods is to manage the term variation such as synonyms, morpho-syntactic variants, abbreviation or nested terms. Different combinations of resources and Indri settings are explored, mostly based on query expansion. For the runs submitted, our system shows up to 67.40 p@10 and up to 67.93 NDCG@10.

[1]  Sanna Salanterä,et al.  Overview of the ShARe/CLEF eHealth Evaluation Lab 2013 , 2013, CLEF.

[2]  Alla Keselman,et al.  Exploring Lexical Forms: First-Generation Consumer Health Vocabularies , 2006, AMIA.

[3]  Christian Köhler,et al.  What is the prevalence of health-related searches on the World Wide Web? Qualitative and quantitative analysis of search engine queries on the Internet , 2003, AMIA.

[4]  Gareth J. F. Jones,et al.  ShARe/CLEF eHealth Evaluation Lab 2014, Task 3: User-centred Health Information Retrieval , 2014, CLEF.

[5]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[6]  Maaike J de Boer,et al.  Patients' use of the Internet for pain-related medical information. , 2007, Patient education and counseling.

[7]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[8]  Kathleen R. McKeown,et al.  Towards generating patient specific summaries of medical articles , 2001 .

[9]  Allen C. Browne,et al.  Identifying Consumer-Friendly Display (CFD) Names for Health Concepts , 2005, AMIA.

[10]  Christian Jacquemin A symbolic and surgical acquisition of terms through variation , 1995, Learning for Natural Language Processing.

[11]  W. Bruce Croft,et al.  Indri: A language-model based search engine for complex queries1 , 2005 .

[12]  Q. Zeng,et al.  Exploring and Developing Consumer Health Vocabularies , 2005 .

[13]  Qing Zeng-Treitler,et al.  A Text Corpora-Based Estimation of the Familiarity of Health Terminology , 2005, ISBMDA.

[14]  Joseph A. Diaz,et al.  Patients’ use of the internet for medical information , 2002, Journal of General Internal Medicine.

[15]  Noémie Elhadad Comprehending Technical Texts: Predicting and Defining Unfamiliar Terms , 2006, AMIA.

[16]  Rainer Bromme,et al.  Choice of Words in Doctor–Patient Communication: An Analysis of Health-Related Internet Sites , 2007, Health communication.

[17]  Noémie Elhadad,et al.  Mining a Lexicon of Technical Terms and Lay Equivalents , 2007, BioNLP@ACL.

[18]  Célia Boyer,et al.  D8.1.1. Requirements for the general public health search , 2011 .

[19]  Thierry Hamon,et al.  Improving Term Extraction with Terminological Resources , 2006, FinTAL.

[20]  Ellen Riloff,et al.  Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing , 1996, Lecture Notes in Computer Science.

[21]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[22]  G. Kleiber,et al.  L'hyponymie revisitée : inclusion et hiérarchie , 1990 .