Team IRLabDAIICT at ShARe/CLEF eHealth 2014 Task 3: User-centered Information Retrieval System for Clinical Documents

In this paper we, Team IRLabDAIICT, describe our participation in the ShARe/CLEF ehealth 2014 task 3: Information Retrieval for addressing questions related to patients health based on clinical reports. We submitted a total of six runs out of the seven in this years task. In our approach we focus on examining the relevance between the documents and user generated query by conducting experiments through query analysis. Our major challenge is to bridge the conceptual gap between the user-generated queries (in-formal query) to biomedical specific terminology (formal query). We incorporate the MeSH (Medical Subject Headings) library , which is a medical thesaurus mapping layman terms to medical synonym terms in order to target the concept matching problem. We use blind relevance feedback model for relevance feedback and query-likelihood model for query expansion which performed the best in the experiments conducted by us. The retrieval system is evaluated based on various parameters as: mean average precision, precision (P@5), precision (P@10), NDCG@5 and NDCG@10, with P@10 and NDCG@10 being the primary and secondary evaluation measures. The experiments were conducted on the gigantic 43.6 GB ShARe/CLEF 2013 Task 3 dataset harvested using (a) EU-FP7 Khresmoi project and and (b) a new 2014 set of English general realistic public queries based on the discharge summary contents. We have obtained the highest result in our baseline run (run 1), with compared to our other five runs, which is 0.706 as declared by ShARe/CLEF organizing committee. We further propose to incorporate a machine learning based retrieval algorithm prediction model for further exploration.