Query expansion using lexical-semantic relations

Applications such as office automation, news filtering, help facilities in complex systems, and the like require the ability to retrieve documents from full-text databases where vocabulary problems can be particularly severe. Experiments performed on small collections with single-domain thesauri suggest that expanding query vectors with words that are lexically related to the original query words can ameliorate some of the problems of mismatched vocabularies. This paper examines the utility of lexical query expansion in the large, diverse TREC collection. Concepts are represented by WordNet synonym sets and are expanded by following the typed links included in WordNet. Experimental results show this query expansion technique makes little difference in retrieval effectiveness if the original queries are relatively complete descriptions of the information being sought even when the concepts to be expanded are selected by hand. Less well developed queries can be significantly improved by expansion of hand-chosen concepts. However, an automatic procedure that can approximate the set of hand picked synonym sets has yet to be devised, and expanding by the synonym sets that are automatically generated can degrade retrieval performance.

[1]  Aslib,et al.  The journal of documentation , 1945 .

[2]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[3]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[4]  Edward Fox,et al.  Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types , 1983 .

[5]  G. Salton,et al.  A Generalized Term Dependence Model in Information Retrieval , 1983 .

[6]  Alan F. Smeaton,et al.  The Retrieval Effects of Query Expansion on a Feedback Document Retrieval System , 1983, Comput. J..

[7]  Martha W. Evens,et al.  Relational thesauri in information retrieval , 1985, J. Am. Soc. Inf. Sci..

[8]  Chris Buckley,et al.  Implementation of the SMART Information Retrieval System , 1985 .

[9]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[10]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[11]  M. E. Maron,et al.  Full-text information retrieval: Further analysis and clarification , 1990, Inf. Process. Manag..

[12]  Peter Willett,et al.  The limitations of term co-occurrence data for query expansion in document retrieval systems , 1991, J. Am. Soc. Inf. Sci..

[13]  Ellen M. Voorhees,et al.  Vector Expansion in a Large Collection , 1992, TREC.

[14]  James Allan,et al.  Automatic Retrieval With Locality Information Using SMART , 1992, TREC.

[15]  James Allan,et al.  Automatic Routing and Ad-hoc Retrieval Using SMART: TREC 2 , 1993, TREC.

[16]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .

[17]  Donna K. Harman The First Text REtrieval Conference (TREC-1), Rockville, MD, USA, 4-6 November 1992 , 1993, Inf. Process. Manag..

[18]  Ellen M. Voorhees,et al.  On Expanding Query Vectors with Lexically Related Words , 1993, TREC.

[19]  Gerard Salton,et al.  Automatic Routing and Retrieval Using Smart: TREC-2 , 1995, Inf. Process. Manag..