Experiments in the Probabilistic Retrieval of Full Text Documents

The experiments described here constitue a continuation of a research program whose object is to find probabilistically sound, yet simple and powerful, ways of combining search clues in full-text retrieval. The methodology investigated for ad hoc retrieval is that of logistic regression, in which the retrieval rule takes the form of a regression equation fitted to learning data. Most of the variables used in the regression take the form of means rather than the more customary sums, and it is argued that is logically preferable. Radical manual reformulations of the topics were tried out and found to boost retrieval effectiveness. For routing retrieval, an approach based on the Assumption of Linked Dependence, involving the extraction of relevance associated stems from feedback documents, is investigated. One characteristic of this approach is that only a very minimal use is made of the original topic formulation.