Improving Document Retrieval by Automatic Query Expansion Using Collaborative Learning of Term-Based Concepts

Query expansion methods have been studied for a long time - with debatable success in many instances. In this paper, a new approach is presented based on using term concepts learned by other queries. Two important issues with query expansion are addressed: the selection and the weighing of additional search terms. In contrast to other methods, the regarded query is expanded by adding those terms which are most similar to the concept of individual query terms, rather than selecting terms that are similar to the complete query or that are directly similar to the query terms. Experiments have shown that this kind of query expansion results in notable improvements of the retrieval effectiveness if measured the recall/precision in comparison to the standard vector space model and to the pseudo relevance feedback. This approach can be used to improve the retrieval of documents in Digital Libraries, in Document Management Systems, in the WWW etc.

[1]  Yonggang Qiu ISIR: An Integrated System for Information Retrieval , 1993 .

[2]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[3]  James Allan,et al.  The effect of adding relevance information in a relevance feedback environment , 1994, SIGIR '94.

[4]  James Allan,et al.  Incremental relevance feedback for information filtering , 1996, SIGIR '96.

[5]  吴德恒,et al.  经Co , 1964 .

[6]  Makoto Iwayama,et al.  Relevance feedback with a small number of relevance judgements: incremental relevance feedback vs. document clustering , 2000, SIGIR '00.

[7]  Karen Spärck Jones Notes and references on early automatic classification work , 1991, SIGF.

[8]  Peter Willett,et al.  The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems , 1991 .

[9]  Jack Minker,et al.  An evaluation of query expansion by the addition of clustered terms for a document retrieval system , 1972, Inf. Storage Retr..

[10]  Vijay V. Raghavan,et al.  Concept Based Retrieval by Minimal Term Sets , 1999, ISMIS.

[11]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[12]  Amanda Spink,et al.  Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[13]  Vijay V. Raghavan,et al.  Enhancing Internet Search Engines to Achieve Concept-based Retrieval , 1999 .

[14]  Peter Willett,et al.  The limitations of term co-occurrence data for query expansion in document retrieval systems , 1991, J. Am. Soc. Inf. Sci..

[15]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[16]  Kui-Lam Kwok,et al.  Query modification and expansion in a network with adaptive architecture , 1991, SIGIR '91.

[17]  Ari Pirkola,et al.  Studies on Linguistic Problems and Methods in Text Retrieval: The Effects of Anaphor and Ellipsis Resolution in Proximity Searching, and Translation and Query Structuring Methods in Cross-Language Retrieval , 1999 .

[18]  Markus Junker,et al.  Passage-Based Document Retrieval as a Tool for Text Mining with User's Information Needs , 2001, Discovery Science.

[19]  IJsbrand Jan Aalbersberg,et al.  Incremental relevance feedback , 1992, SIGIR '92.

[20]  W. Bruce Croft Approaches to Intelligent Information Retrieval , 1987, Inf. Process. Manag..

[21]  Vijay V. Raghavan,et al.  Adaptive Concept-based Retrieval Using a Neural Network∗ , 2000 .

[22]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[23]  Alan F. Smeaton,et al.  The Retrieval Effects of Query Expansion on a Feedback Document Retrieval System , 1983, Comput. J..

[24]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[25]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[26]  Daniel G. Shapiro,et al.  RUBRIC: A System for Rule-Based Information Retrieval , 1985, IEEE Transactions on Software Engineering.

[27]  C. J. van Rijsbergen,et al.  The selection of good search terms , 1981, Inf. Process. Manag..

[28]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[29]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[30]  Stephen E. Robertson,et al.  Evaluation of Interfaces for IRS: Modelling End-User Searching Behaviour , 1998, BCS-IRSG Annual Colloquium on IR Research.