RMIT University at TREC 2008: Relevance Feedback Track

The 2008 Relevance Feedback Track provided a set of relevance judgements for participants to use. These judgements are based on data from previous TREC tracks (2004‐2006 Terabyte Tracks, and the 2007 Million Query Track). Different runs for the Relevance Feedback Track made use of varying numbers of relevant and non relevant documents (see Section 4 for details). Based on the set of available documents with known relevance judgements, a query expansion scheme aims to identify the set of terms that, when added to the original query, is most likely to be able to boost retrieval performance. For all the experiments reported in this paper, queries were expanded using only terms which occur in the documents provided for expansion. No new terms were introduced from external sources. Let R be the set of documents in the collection that are known to be relevant for the current query (that is, the set of relevant documents provided as part of the Track framework). To expand a query, a set of candidate expansion terms S is first established. In our experiments, we explore two appr oaches for the construction of the candidate term set, S. In the first, we combine all the terms from the provided relev ant documents R into a single term-pool. Treating the available expansion documents as a single unit provides a means of selecting expansion terms which are signature to the set of relevant documents as a whole. Furthermore, the question of, how expansion terms from different documents should be combined can be avoided. We call this approach METHOD1. In the second approach, a set Sd is constructed separately for each relevant document, d. Here, term weights are first calculated for each set Sd independently, using one of the weighting schemes described below. The top ranked terms from each set are then added to the original query by selecting the top terms in an interleaved fashion. Preference is given to terms that oc cur in over half the expansion documents. This approach ensures that the expansion terms are sourced from a variety of documents; in the first method, it is possible for terms from a small subset of relevant documents to dominate. We call this approach METHOD2. Once the candidate term sets are constructed, term weighting approaches are used to rank and select the final expansion terms. In our submitted runs, we make use o f a TF × IDF approach. Here, a term’s weight is calculated as the product of its occurrence freque ncy within the set S, and its inverse document