A concept-based approach for indexing documents in IR

This paper addresses two important problems related to the use of semantics in IR. The first one concerns the representation of document semantics and its proper use in retrieval. The second is the integration of semantic-based retrieval with "traditional" keywords-based retrieval. The proposed approach aims to represent the document content by the best semantic network called document semantic core in two main steps. The first step extracts concepts (mono and multiword) from a document, driven by external generalpurpose ontology, namely WordNet. The second step builds the best semantic network by achieving a global disambiguation of the extracted concepts regarding to the document. Thus, selected concepts senses represent the nodes of the semantic network while the similarity measure values between them represent the arcs. The resulted scored concepts senses are used for conceptual indexing in Information Retrieval. MOTS-CLÉS : Recherche d'Information, Représentation Sémantique de documents, mesures de similarité sémantique, indexation conceptuelle, ontologies, WordNet.

[1]  Alexander Budanitsky,et al.  Lexical Semantic Relatedness and Its Application in Natural Language Processing , 1999 .

[2]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[3]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[4]  Hele-Mai Haav,et al.  A Survey of Concept-based Information Retrieval Tools on the Web , 2001 .

[5]  Mohand Boughanem,et al.  The Use of Ontology for Semantic Representation of Documents , 2004 .

[6]  H. Chertkow,et al.  Semantic memory , 2002, Current neurology and neuroscience reports.

[7]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[8]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[9]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[10]  Feng Luo,et al.  Ontology construction for information selection , 2002, 14th IEEE International Conference on Tools with Artificial Intelligence, 2002. (ICTAI 2002). Proceedings..

[11]  Mark Sanderson,et al.  Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ , 2022 .

[12]  Ted Pedersen,et al.  Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[13]  Myoung-Ho Kim,et al.  Information Retrieval Based on Conceptual Distance in is-a Hierarchies , 1993, J. Documentation.

[14]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[15]  Paola Velardi,et al.  Extending and Enriching WordNet with OntoLearn , 2004 .

[16]  Guildhall StreetCambridge Comparisons of Probabilistic Compound Unit Weighting Methods for Chinese Information Retrieval , 2007 .

[17]  W. Bruce Croft,et al.  The use of phrases and structured queries in information retrieval , 1991, SIGIR '91.

[18]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[19]  Rada Mihalcea,et al.  Semantic Indexing using WordNet Senses , 2000 .

[20]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[21]  John Tait,et al.  Word sense disambiguation in information retrieval revisited , 2003, SIGIR.

[22]  W. Bruce Croft,et al.  Lexical ambiguity and information retrieval , 1992, TOIS.

[23]  William A. Woods,et al.  Conceptual Indexing: A Better Way to Organize Knowledge , 1997 .

[24]  Nicola Guarino,et al.  OntoSeek: content-based access to the Web , 1999, IEEE Intell. Syst..