A Study on Some Tasks, Corpus and Resources of Medical Information Retrieval

Background/Objectives: This paper gives an overview of some tasks involved in the retrieval process, corpus and resources of medical information retrieval. Methods/Statistical Analysis: Inverted file representation method is used in the retrieval process for associating documents in the corpus with various search terms. Conventional statistical ranking functions such as Jaccard, Okapi and Euclidean have been widely used for ranking retrieved medical documents. An extractive informative generic mono-lingual single-document summarizer is used to produce medical domain-specific summary. Sentence ranking method is used to include most appropriate sentences in the final summary. Findings: Studies reveal that people are searching the web and read medical related information in order to be informed about their health. In the medical domain, richest and most used source of information is MEDLINE. Because of frequent use of acronyms in the medical literature, using the term that appears in documents as keywords for document indexing would not be effective. Also, using Bag of Words representation could not capture the semantic meaning of terms. Some domain-specific thesauri like UMLS, MeSH and Gene ontology are available for biomedical retrieval. These domain-specific thesauri can provide synonyms, hypernyms and hyponyms of a specific term but it does not look into the context. Therefore, the retrieval results of using domain-specific thesauri are somewhat conflicting. It is possible to identify which lexical variant of specific term should be used under specific context by using Wikipedia as resource for biomedical retrieval. Conventional ranking functions fail to capture the inherent features of natural language text. Evolutionary algorithm based ranking can enhance the retrieval performance. Any domain-specific summarizer must consider similarity between sentences as essential feature for summarization. Applications/Improvements: Improvements in retrieval results is achieved by using context-aware keywords as indexing keywords and highly robust hybrid evolutionary algorithm based ranking function for ordering the retrieved documents.

[1]  Lynda Tamine,et al.  Factors affecting the effectiveness of biomedical document indexing and retrieval based on terminologies , 2013, Artif. Intell. Medicine.

[2]  Shubhangi C. Tirpude An Approach to Single Documnent Text Summarization & Simplification , 2014 .

[3]  Jimmy J. Lin,et al.  Single-document and multi-document summarization techniques for email threads using sentence compression , 2008, Inf. Process. Manag..

[4]  Lina Fatima Soualmia,et al.  BioDI: A New Approach to Improve Biomedical Documents Indexing , 2013, DEXA.

[5]  Krishnan Ramanathan,et al.  Document Summarization using Wikipedia , 2009, IHCI.

[6]  S. Santhana Megala,et al.  Enriching Text Summarization using Fuzzy Logic , 2014 .

[7]  Elizabeth León Guzman,et al.  Extractive single-document summarization based on genetic operators and guided local search , 2014, Expert Syst. Appl..

[8]  Frank van Harmelen,et al.  Identifying Disease-Centric Subdomains in Very Large Medical Ontologies: A Case-Study on Breast Cancer Concepts in SNOMED CT. Or: Finding 2500 Out of 300.000 , 2009, KR4HC.

[9]  K. Swartz Health care for the poor: For whom, what care, and whose responsibility? , 2009 .

[10]  Bahgat A. Abdel Latef,et al.  Using Genetic Algorithm to Improve Information Retrieval Systems , 2008 .

[11]  Yogesh Gupta,et al.  A new fuzzy logic based ranking function for efficient Information Retrieval system , 2015, Expert Syst. Appl..

[12]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[13]  William E. Moen,et al.  Automatic keyword extraction for learning object repositories , 2008, ASIST.

[14]  Euripides G. M. Petrakis,et al.  The AMTEx approach in the medical document indexing and retrieval application , 2009, Data Knowl. Eng..

[15]  Sharvari Govilkar,et al.  Comparative Study of Text Summarization Methods , 2014 .

[16]  Euripides G. M. Petrakis,et al.  Automatic document indexing in large medical collections , 2006, HIKM '06.

[17]  M. Nasipuri,et al.  Using Machine Learning for Medical Document Summarization , 2011 .

[18]  Gurpreet Singh Lehal,et al.  A Survey of Text Summarization Extractive Techniques , 2010 .

[19]  Neil Rubens The Application of Fuzzy Logic to the Construction of the Ranking Function of Information Retrieval Systems , 2006, ArXiv.

[20]  Zhoujun Li,et al.  A Survival Modeling Approach to Biomedical Search Result Diversification Using Wikipedia , 2010, IEEE Transactions on Knowledge and Data Engineering.

[21]  Giulio Paci,et al.  Wikipedia-based Approach for Linking Ontology Concepts to their Realisations in Text , 2010, LREC.

[22]  Shuaiqiang Wang,et al.  An immune programming-based ranking function discovery approach for effective information retrieval , 2010, Expert Syst. Appl..

[23]  Thomas Tran,et al.  A Machine Learning Approach for Identifying Disease-Treatment Relations in Short Texts , 2011, IEEE Transactions on Knowledge and Data Engineering.

[24]  Ian H. Witten,et al.  An open-source toolkit for mining Wikipedia , 2013, Artif. Intell..

[25]  Shihchieh Chou,et al.  An information retrieval system for medical records & documents , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[26]  T. Martin McGinnity,et al.  A Context-Based Word Indexing Model for Document Summarization , 2013, IEEE Transactions on Knowledge and Data Engineering.

[27]  Michael Uschold,et al.  Ontologies: principles, methods and applications , 1996, The Knowledge Engineering Review.

[28]  V. Subramaniyaswamy,et al.  An Effective Approach to Rank Reviews Based on Relevance by Weighting Method , 2015 .

[29]  Y. Surendranadha Reddy,et al.  An Efficient Approach for Web document summarization by Sentence Ranking , 2012 .

[30]  G. R. Brindha,et al.  Preference Based Quantified Summarization of On-line Reviews , 2014 .