Relevance weighting using distance between term occurrences

Recent work has achieved promising retrieval performance using distance between term occurrences as a primary estimator of document relevance. A major bene t of this approach is that relevance scoring does not rely on collection frequency statistics. A theoretical framework for lexical spans is now proposed which encompasses these approaches and suggests a number of important directions for future experimental work. Based on the formalism, approaches to issues such as scoring partial spans, treatment of repeated term occurrences within spans, and the importance of ordering are proposed. Consideration is given to the practical application of the formalism to both locating and scoring concept intersections and to locating phrases (with an estimate of con dence) despite intervening or substituted words.