论文信息 - Tie Breaker: A Novel Way of Combining Retrieval Signals

Tie Breaker: A Novel Way of Combining Retrieval Signals

Empirical studies of information retrieval suggest that the effectiveness of a retrieval function is closely related to how it combines multiple retrieval signals including term frequency, inverse document frequency and document length. Although it is relatively easy to capture how each signal contributes to the relevance scores, it is more challenging to find the best way of combining these signals since they often interact with each other in a complicated way. As a result, when deriving a retrieval function from traditional retrieval models, the choice of one implementation over the others was often made based on empirical observations rather than sound theoretical derivations. In this paper, we propose a novel way of combining retrieval signals to derive robust retrieval functions. Instead of seeking an integrated way of combining these signals into a complex mathematical retrieval function, our main idea is to prioritize the retrieval signals, apply the strongest signal first to rank documents, and then iteratively use the weaker signals to break the ties of the documents with the same scores. One unique advantage of our method is that it eliminates the need of having complicated implementation of the signals and enables a simple yet elegant way of combining the multiple signals for document ranking. Empirical results show that the proposed method can achieve comparable performance as the state of art retrieval functions over traditional TREC ad hoc retrieval collections, and can outperform them over TREC microblog retrieval collections.

Hao Wu | Hui Fang

[1] Stephen E. Robertson,et al. Okapi at TREC-3 , 1994, TREC.

[2] Stephen E. Robertson,et al. GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[3] Stephen E. Robertson,et al. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[4] Tao Tao,et al. Diagnostic Evaluation of Information Retrieval Models , 2011, TOIS.

[5] ChengXiang Zhai,et al. Lower-bounding term frequency normalization , 2011, CIKM '11.

[6] Chris Buckley,et al. Pivoted Document Length Normalization , 1996, SIGIR Forum.

[7] Stephen E. Robertson,et al. Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[8] ChengXiang Zhai,et al. An exploration of axiomatic approaches to information retrieval , 2005, SIGIR '05.

[9] John D. Lafferty,et al. A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[10] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[11] Éric Gaussier,et al. Information-based models for ad hoc IR , 2010, SIGIR '10.