A new method of weighting query terms for ad-hoc retrieval

K.L. Kwok email: kklqc@cunyvm.ctmy .edu Dept., Queens College, City University of New York, Flushing, NY 11367, USA. Ad-hoc retrieval relies on the evidence from a user’s query to provide a sufficient variety of terms as well as different term frequencies for differentiating term importance. Short queries lack both typea of information. A new method of automatically weighting query terms for ad-hoc retrieval is introduced that works for short queries. It is based on the term usage statistics in a collection and no training is required. Expximents with both the TREC2 and TREC4 ad-hoc queries show that this weighting scheme can provide significantly better results at the initial retrievat stage. At the expanded query stage, results vary from equal to significantly better than those relying on the originst query weights. In particular, this automatic method provides similar improvements to extra short queries of two to four content terms only.

[1]  Kui-Lam Kwok,et al.  TREC-2 Document Retrieval Experiments using PIRCS , 1993, TREC.

[2]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[3]  X. Allan Lu,et al.  Query Expansion/Reduction and its Impact on Retrieval Effectiveness , 1994, TREC.

[4]  David A. Evans,et al.  Design and Evaluation of the CLARIT-TREC-2 System , 1993, TREC.

[5]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[6]  Kui-Lam Kwok,et al.  A network approach to probabilistic information retrieval , 1995, TOIS.

[7]  Kui-Lam Kwok,et al.  TREC-3 Ad-Hoc, Routing Retrieval and Thresholding Experiments using PIRCS , 1994, TREC.

[8]  Clement T. Yu,et al.  A theory of term importance in automatic text analysis , 1974, J. Am. Soc. Inf. Sci..

[9]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[10]  Kui-Lam Kwok,et al.  TREC-4 Ad-Hoc, Routing Retrieval and Filtering Experiments using PIRCS , 1995, TREC.

[11]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[12]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[13]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[14]  Kui-Lam Kwok,et al.  Experiments with a component theory of probabilistic information retrieval based on single terms as document components , 1990, TOIS.

[15]  Chris Buckley,et al.  New Retrieval Approaches Using SMART: TREC 4 , 1995, TREC.

[16]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[17]  Kui-Lam Kwok,et al.  Retrieval Experiments with a Large Collection using PIRCS , 1992, TREC.