Document/query expansion based on selecting significant concepts for context based retrieval of medical images

In the medical image retrieval literature, there are two main approaches: content-based retrieval using the visual information contained in the image itself and context-based retrieval using the metadata and the labels associated with the images. We present a work that fits in the context-based category, where queries are composed of medical keywords and the documents are metadata that succinctly describe the medical images. A main difference between the context-based image retrieval approach and the textual document retrieval is that in image retrieval the narrative description is very brief and typically cannot describe the entire image content, thereby negatively affecting the retrieval quality. One of the solutions offered in the literature is to add new relevant terms to both the query and the documents using expansion techniques. Nevertheless, the use of native terms to retrieve images has several disadvantages such as term-ambiguities. In fact, several studies have proved that mapping text to concepts can improve the semantic representation of the textual information. However, the use of concepts in the retrieval process has its own problems such as erroneous semantic relations between concepts in the semantic resource. In this paper, we propose a new expansion method for medical text (query/document) based on retro-semantic mapping between textual terms and UMLS concepts that are relevant in medical image retrieval. More precisely, we propose mapping the medical text of queries and documents into concepts and then applying a concept-selection method to keep only the most significant concepts. In this way, the most representative term (preferred name) identified in the UMLS for each selected concept is added to the initial text. Experiments carried out with ImageCLEF 2009 and 2010 datasets showed that the proposed approach significantly improves the retrieval accuracy and outperforms the approaches offered in the literature.

[1]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[2]  Padmini Srinivasan,et al.  Query Expansion and MEDLINE , 1996, Inf. Process. Manag..

[3]  Lucila Ohno-Machado,et al.  A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge , 2017, Database J. Biol. Databases Curation.

[4]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[5]  Padmini Srinivasan,et al.  Optimal Document-Indexing Vocabulary for MEDLINE , 1996, Inf. Process. Manag..

[6]  Aditi Sharan,et al.  Relevance Feedback Based Query Expansion Model Using Borda Count and Semantic Similarity Approach , 2015, Comput. Intell. Neurosci..

[7]  Waqar Mahmood,et al.  Improved biomedical term selection in pseudo relevance feedback , 2018, Database J. Biol. Databases Curation.

[8]  Hua Min,et al.  Consistency across the hierarchies of the UMLS Semantic Network and Metathesaurus , 2003, J. Biomed. Informatics.

[9]  Lourdes Araujo,et al.  Comparing and Combining Methods for Automatic Query Expansion , 2008, ArXiv.

[10]  Sooyoung Yoo,et al.  Evaluation of Term Ranking Algorithms for Pseudo-Relevance Feedback in MEDLINE Retrieval , 2011, Healthcare informatics research.

[11]  K. Deepa,et al.  Medical Query Expansion using UMLS , 2016 .

[12]  Ram Akella,et al.  The generalized dirichlet distribution in enhanced topic detection , 2012, CIKM.

[13]  Heung-Seon Oh,et al.  Cluster-based query expansion using external collections in medical information retrieval , 2015, J. Biomed. Informatics.

[14]  Lynda Tamine,et al.  Analysis of biomedical and health queries: Lessons learned from TREC and CLEF evaluation benchmarks , 2015, J. Assoc. Inf. Sci. Technol..

[15]  A. R. Rivas,et al.  Study of Query Expansion Techniques and Their Application in the Biomedical Information Retrieval , 2014, TheScientificWorldJournal.

[16]  Xiaohui Li,et al.  Semantic-Enhanced Query Expansion System for Retrieving Medical Image Notes , 2018, Journal of Medical Systems.

[17]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[18]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[19]  Karin M. Verspoor,et al.  Multi-field query expansion is effective for biomedical dataset retrieval , 2017, Database J. Biol. Databases Curation.

[20]  ChengXiang Zhai,et al.  An empirical study of tokenization strategies for biomedical information retrieval , 2007, Information Retrieval.

[21]  Ying Wang,et al.  A study of the effect of term proximity on query expansion , 2006, J. Inf. Sci..

[22]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[23]  Christina Lioma,et al.  Graph-based term weighting for information retrieval , 2011, Information Retrieval.

[24]  Jimeng Sun,et al.  Leveraging medical thesauri and physician feedback for improving medical literature retrieval for case queries , 2012, J. Am. Medical Informatics Assoc..

[25]  Dennis McLeod,et al.  Retrieval effectiveness of an ontology-based model for information selection , 2004, The VLDB Journal.

[26]  Hongfang Liu,et al.  Using large clinical corpora for query expansion in text-based cohort identification , 2014, J. Biomed. Informatics.

[27]  Maher Ben Jemaa,et al.  MF‐Re‐Rank: A modality feature‐based Re‐Ranking model for medical image retrieval , 2018, J. Assoc. Inf. Sci. Technol..

[28]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[29]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[30]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[31]  Lilac Al-Safadi,et al.  Evaluation of Metamap Performance in Radiographic Images Retrieval , 2013 .

[32]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[33]  Feng Wang,et al.  The research of query expansion based on medical terms reweighting in medical information retrieval , 2018, EURASIP J. Wirel. Commun. Netw..

[34]  Halil Kilicoglu,et al.  Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: Preliminary experiment , 2006 .