论文信息 - Relevance weighting using distance between term occurrences

Relevance weighting using distance between term occurrences

Recent work has achieved promising retrieval performance using distance between term occurrences as a primary estimator of document relevance. A major bene t of this approach is that relevance scoring does not rely on collection frequency statistics. A theoretical framework for lexical spans is now proposed which encompasses these approaches and suggests a number of important directions for future experimental work. Based on the formalism, approaches to issues such as scoring partial spans, treatment of repeated term occurrences within spans, and the importance of ordering are proposed. Consideration is given to the practical application of the formalism to both locating and scoring concept intersections and to locating phrases (with an estimate of con dence) despite intervening or substituted words.

David Hawking | Paul B. Thistlewaite | D. Hawking | P. Thistlewaite

[1] Charles L. A. Clarke,et al. Shortest Substring Ranking (MultiText Experiments for TREC-4) , 1995, TREC.

[2] James Allan,et al. Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[3] James Allan,et al. Automatic Retrieval With Locality Information Using SMART , 1992, TREC.

[4] Proceedings of The Fourth Text REtrieval Conference, TREC 1995, Gaithersburg, Maryland, USA, November 1-3, 1995 , 1995, TREC.

[5] David Hawking,et al. Proximity Operators - So Near And Yet So Far , 1995, TREC.

[6] Hans Peter Luhn,et al. The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[7] Alan F. Smeaton,et al. Using WordNet in a Knowledge-Based Approach to Information Retrieval , 1995 .