The Exploration and Analysis of Using Multiple Thesaurus Types for Query Expansion in Information Retrieval.

This paper proposes the use of multiple thesaurus types for query expansion in information retrieval. Hand-crafted thesaurus, corpus-based co-occurrence-based thesaurus and syntactic-relation-based thesaurus are combined and used as a tool for query expansion. A simple word sense disambiguation is performed to avoid misleading expansion terms. Experiments using TREC-7 collection proved that this method could improve the information retrieval performance significantly. Failure analysis was done on the cases in which the proposed method fail to improve the retrieval effectiveness. We found that queries containing negative statements and multiple aspects might cause problems in the proposed method.

[1]  H. Chandler Database , 1985 .

[2]  Mark A. Stairmand Textual context analysis for information retrieval , 1997, SIGIR '97.

[3]  Gregory Grefenstette,et al.  Use of syntactic context to produce term association lists for text retrieval , 1992, SIGIR '92.

[4]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[5]  Takenobu Tokunaga,et al.  Complementing WordNet with Roget’s and Corpus-based Thesauri for Information Retrieval , 1999, EACL.

[6]  M. E. Maron,et al.  An evaluation of retrieval effectiveness for a full-text document-retrieval system , 1985, CACM.

[7]  Hinrich Schütze,et al.  A Cooccurrence-Based Thesaurus and Two Applications to Information Retrieval , 1994, Inf. Process. Manag..

[8]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[9]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[10]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[11]  Ellen M. Voorhees,et al.  Overview of the seventh text retrieval conference (trec-7) [on-line] , 1999 .

[12]  Ellen M. Voorhees,et al.  Overview of the Seventh Text REtrieval Conference , 1998 .

[13]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[14]  Edward A. Fox,et al.  Lexical relations: enhancing effectiveness of information retrieval systems , 1980, SIGF.

[15]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[16]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[17]  Alan Gilchrist,et al.  Thesaurus construction: a practical manual , 1972 .

[18]  Chris D. Paice,et al.  A thesaural model of information retrieval , 1991, Inf. Process. Manag..

[19]  Marti A. Hearst Improving Full-Text Precision on Short Queries using Simple Constraints , 1996 .

[20]  Gerda Ruge,et al.  Experiments on Linguistically-Based Term Associations , 1992, Inf. Process. Manag..

[21]  James Allan,et al.  The effect of adding relevance information in a relevance feedback environment , 1994, SIGIR '94.

[22]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[23]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[24]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[25]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[26]  Ophir Frieder,et al.  Information Retrieval: Algorithms and Heuristics , 1998 .

[27]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[28]  Takenobu Tokunaga,et al.  Ad Hoc Retrieval Experiments Using WordNet and Automatically Constructed Thesauri , 1998, TREC.

[29]  W. Bruce Croft,et al.  An Association Thesaurus for Information Retrieval , 1994, RIAO.

[30]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[31]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[32]  Takenobu Tokunaga,et al.  Combining general hand-made and automatically constructed thesauri for information retrieval , 1999, IJCAI 1999.

[33]  Ralph Grishman,et al.  A Corpus-based Probabilistic Grammar with Only Two Non-terminals , 1995, IWPT.

[34]  Carolyn J. Crouch,et al.  An approach to the automatic construction of global thesauri , 1990, Inf. Process. Manag..

[35]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[36]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[37]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[38]  Takenobu Tokunaga,et al.  Combining multiple evidence from different types of thesaurus for query expansion , 1999, SIGIR '99.

[39]  Hsinchun Chen,et al.  Automatic Thesaurus Generation for an Electronic Community System , 1995, J. Am. Soc. Inf. Sci..

[40]  Carolyn J. Crouch,et al.  Experiments in automatic statistical thesaurus construction , 1992, SIGIR '92.

[41]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[42]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[43]  Peter Willett,et al.  Effectiveness of query expansion in ranked-output document retrieval systems , 1992, J. Inf. Sci..

[44]  William S. Saric Experiments to verify nonparallel stability theory at Virginia Polytechnic Institute and State University, Blacksburg, Virginia , 1988 .

[45]  Alan F. Smeaton,et al.  Using WordNet in a Knowledge-Based Approach to Information Retrieval , 1995 .

[46]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[47]  Jaana Kristensen,et al.  Expanding End-Users' Query Statements for Free Text Searching with a Search-Aid Thesaurus , 1993, Inf. Process. Manag..