Complementing WordNet with Roget’s and Corpus-based Thesauri for Information Retrieval

This paper proposes a method to overcome the drawbacks of WordNet when applied to information retrieval by complementing it with Roget's thesaurus and corpus-derived thesauri. Words and relations which are not included in WordNet can be found in the corpus-derived thesauri. Effects of polysemy can be minimized with weighting method considering all query terms and all of the thesauri. Experimental results show that our method enhances information retrieval performance significantly.

[1]  M. E. Maron,et al.  An evaluation of retrieval effectiveness for a full-text document-retrieval system , 1985, CACM.

[2]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[3]  Takenobu Tokunaga,et al.  Combining general hand-made and automatically constructed thesauri for information retrieval , 1999, IJCAI 1999.

[4]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[5]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[6]  Ralph Grishman,et al.  A Corpus-based Probabilistic Grammar with Only Two Non-terminals , 1995, IWPT.

[7]  Gregory Grefenstette,et al.  Use of syntactic context to produce term association lists for text retrieval , 1992, SIGIR '92.

[8]  Hinrich Schütze,et al.  A Cooccurrence-Based Thesaurus and Two Applications to Information Retrieval , 1994, Inf. Process. Manag..

[9]  Mark A. Stairmand Textual context analysis for information retrieval , 1997, SIGIR '97.

[10]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[11]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[12]  Alan Gilchrist,et al.  Thesaurus construction: a practical manual , 1972 .

[13]  Ellen M. Voorhees,et al.  Overview of the Seventh Text REtrieval Conference , 1998 .

[14]  Gerda Ruge,et al.  Experiments on Linguistically-Based Term Associations , 1992, Inf. Process. Manag..

[15]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[16]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[17]  Christiane Fellbaum,et al.  Nouns in WordNet , 1998 .

[18]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[19]  Ellen M. Voorhees,et al.  Overview of the seventh text retrieval conference (trec-7) [on-line] , 1999 .

[20]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[21]  Takenobu Tokunaga,et al.  Combining General Hand-Made and Automatically Constructed Thesauri for Query Expansion in Information Retrieval , 1999, IJCAI.

[22]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[23]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[24]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[25]  Christiane Fellbaum,et al.  Using Wordnet for Text Retrieval , 1998 .

[26]  Alan F. Smeaton,et al.  Using WordNet in a Knowledge-Based Approach to Information Retrieval , 1995 .