Matching Fusion with Conceptual Indexing

Many studies have been addressed the term-mismatch problem, which arises when using different terms or words for expressing the same meaning. We also introduce another problem: over-specialized document, which is caused when IR systems prefer documents that have poor query-document intersection, but with high weighting value, to those that have rich query-document intersection with low weighting value. In this study, we propose to use, simultaneously, multiple types of indexing elements: ngrams, keywords, and concepts, instead of only keywords. We followed a late data-fusion technique to achieve that. Through our proposed model, we also try to overcome the over-specialized document problem. Experiments for model validation have been done by using ImageCLEF2011 test collection, UMLS2009 Meta-thesaurus, and MetaMap tool for mapping text into UMLS concepts.

[1]  Fabio Crestani,et al.  Exploiting the Similarity of Non-Matching Terms at Retrieval Time , 2000, Information Retrieval.

[2]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[3]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[4]  William B. Frakes,et al.  Stemming Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[5]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing , 1974 .

[6]  William A. Woods,et al.  Conceptual Indexing: A Better Way to Organize Knowledge , 1997 .

[7]  Tao Tao,et al.  A formal study of information retrieval heuristics , 2004, SIGIR '04.

[8]  Nicholas J. Belkin,et al.  The effect multiple query representations on information retrieval system performance , 1993, SIGIR.

[9]  Mustapha Baziz Indexation conceptuelle guidée par ontologie pour la recherche d'information , 2005 .

[10]  Olivier Bodenreider,et al.  Lexically-suggested hyponymic relations among medical terms and their representation in the UMLS , 2001 .

[11]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing. Part II. An algorithm for probabilistic indexing , 1975, J. Am. Soc. Inf. Sci..

[12]  J. L. S. Luk Mémoire d'habilitation à diriger des recherches , 2000 .

[13]  Jean-Pierre Chevallet,et al.  Solving Concept mismatch through Bayesian Framework by Extending UMLS Meta-Thesaurus , 2011, CORIA.

[14]  Loïc Maisonnasse Les supports de vocabulaires pour les systèmes de recherche d'information orientés précision : application aux graphes pour la recherche d'information médicale. (Vocabulary supports for precision oriented information retrieval systems: application to graphs for medical information retrieval) , 2008 .

[15]  Manuel J. Maña López,et al.  LABERINTO at ImageCLEF 2011 Medical Image Retrieval Task , 2011, CLEF.

[16]  Olivier Bodenreider,et al.  Case Report: Evaluation of the Unified Medical Language System as a Medical Knowledge Source , 1998, J. Am. Medical Informatics Assoc..

[17]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[18]  Joo-Hwee Lim,et al.  Domain knowledge conceptual inter-media indexing: application to multilingual multimedia medical reports , 2007, CIKM '07.

[19]  Thomas C. Rindflesch,et al.  Query Expansion Using the UMLS ® Metathesaurus ® , 1997 .

[20]  Jean-Pierre Chevallet,et al.  Exploiting and Extending a Semantic Resource for Conceptual Indexing , 2011 .

[21]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[22]  Charles H. Davis American Society for Information Science , 1984 .

[23]  Joon Ho Lee,et al.  Combining multiple evidence from different properties of weighting schemes , 1995, SIGIR '95.

[24]  Jeffrey Katzer,et al.  A study of the overlap among document representations , 1983, SIGIR '83.

[25]  Susan T. Dumais,et al.  Improving information retrieval using latent semantic indexing , 1988 .

[26]  Saïd Radhouani Un modèle de Recherche d'Information orienté précision fondé sur les dimensions de domaine , 2008 .

[27]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[28]  Khalid Al-Kofahi,et al.  Fast tagging of medical terms in legal text , 2007, ICAIL.

[29]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .

[30]  Henning Müller,et al.  Overview of the CLEF 2011 Medical Image Classification and Retrieval Tasks , 2011, CLEF.

[31]  Johanna Enberg,et al.  Query Expansion , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[32]  W. Bruce Croft Incorporating different search models into one document retrieval system , 1981, SIGIR '81.

[33]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing. Part I. On the Distribution of Specialty Words in a Technical Literature , 1975, J. Am. Soc. Inf. Sci..