Information retrieval approach based on indexing text documents: Application to biomedical domain

The use of the controlled vocabulary for documents indexing makes it possible to index documents with other terms not present in the document. The proposed approach, based on Vector Space Model (VSM), allows matching partially document and MESH (Medical Subject Headings) terms. The stemming method applied for the preparation of terms allows favoring the partial matching. In this paper we propose an Information Retrieval Approach based on Indexing Text Documents which combine two methods statistical and semantic one. A measure of similarity is then considered to measure the correspondence between a document and term in MESH thesaurus. To filter the extracted concepts and keep the relevant ones, we exploited the MESH architecture for better indexing. The proposed approach is implemented to evaluate its effectiveness and the experimentation and the results analysis shows that the proposed method gives good results compared to standard measures.

[1]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[2]  Mohamed Nazih Omri,et al.  Possibilistic Network based Information Retrieval Model , 2015, 2015 15th International Conference on Intelligent Systems Design and Applications (ISDA).

[3]  Lina Fatima Soualmia,et al.  Extraction possibiliste de concepts MeSH à partir de documents biomédicaux , 2014, Rev. d'Intelligence Artif..

[4]  Mohamed Nazih Omri,et al.  IRAFCA: an O(n) information retrieval algorithm based on formal concept analysis , 2015, Knowledge and Information Systems.

[5]  Betsy L. Humphreys,et al.  Relationships in Medical Subject Headings (MeSH) , 2001 .

[6]  Mohamed Nazih Omri,et al.  Information Retrieval Based on Description Logic: Application to Biomedical Documents , 2017, 2017 International Conference on High Performance Computing & Simulation (HPCS).

[7]  Kabil BOUKHARI,et al.  RAID : Robust Algorithm for stemmIng text Document , 2016 .

[8]  Euripides G. M. Petrakis,et al.  The AMTEx approach in the medical document indexing and retrieval application , 2009, Data Knowl. Eng..

[9]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[10]  Samir Elloumi,et al.  Formal context coverage based on isolated labels: An efficient solution for text feature extraction , 2012, Inf. Sci..

[11]  Lina Fatima Soualmia,et al.  BioDI: A New Approach to Improve Biomedical Documents Indexing , 2013, DEXA.

[12]  Sougata Mukherjea,et al.  Enhancing a biomedical information extraction system with dictionary mining and context disambiguation , 2004, IBM J. Res. Dev..

[13]  Patrick Ruch,et al.  Automatic assignment of biomedical categories: toward a generic approach , 2006, Bioinform..

[14]  Mark A. Musen,et al.  NCBO Resource Index: Ontology-based search and mining of biomedical resources , 2010, J. Web Semant..

[15]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[16]  Luis M. de Campos,et al.  Automatic Indexing from a Thesaurus Using Bayesian Networks: Application to the Classification of Parliamentary Initiatives , 2007, ECSQARU.

[17]  Chengzhi Zhang,et al.  Automatic Keyword Extraction from Documents Using Conditional Random Fields , 2008 .

[18]  Mohamed Nazih Omri,et al.  Information Retrieval Model using Uncertain Confidence's Network , 2017, Int. J. Inf. Retr. Res..

[19]  Anita Burgun-Parenthoine,et al.  Automatic concept extraction from spoken medical reports , 2003, Int. J. Medical Informatics.

[20]  Ali Jaoua,et al.  Text Categorization Using Hyper Rectangular Keyword Extraction: Application to News Articles Classification , 2015, RAMICS.

[21]  Xiaohua Hu,et al.  MaxMatcher: Biological Concept Extraction Using Approximate Dictionary Lookup , 2006, PRICAI.

[22]  Mohamed Nazih Omri,et al.  Complex Terminology Extraction Model from Unstructured Web Text Based Linguistic and Statistical Knowledge , 2012, Int. J. Inf. Retr. Res..

[23]  Mohamed Nazih Omri,et al.  SAID: A new stemmer algorithm to indexing unstructured Document , 2015, 2015 15th International Conference on Intelligent Systems Design and Applications (ISDA).