Analyses of multiple-evidence combinations for retrieval strategies

Multiple-evidence techniques are touted as means to improve the effectiveness of systems. Belkin, et al. [1] examined the effects of various query representations. Fox, et al. [2] proposed several combination algorithms and found that combinations of the same types of runs (long and short queries within the vector space model) did not yield improvement and sometimes even degraded performance. He did achieve improvement over individual runs when merging different retrieval strategies (e.g., vector space and pnorm Boolean). Lee [3] further examined various combination algorithms for fusing result sets to improve effectiveness. He identified that, for multiple-evidence to improve system effectiveness, the retrieved sets must have higher relevance overlap than non-relevance overlap. Lee did not identify the exact difference needed to improve effectiveness. His results had a 125% difference in relevant and non-relevant overlap. While Lee's experiments focused on different system result sets, we focus on effective ranking strategies removing systemic differences of parsers, stemmers, phrase processing and weighting factors. We show that the improvements shown by Lee were likely produced by fusing ranking strategies less tuned than today’s measures, and current improvements are likely to be produced by systemic differences rather than ranking strategies.