Tie Breaker: A Novel Way of Combining Retrieval Signals

Empirical studies of information retrieval suggest that the effectiveness of a retrieval function is closely related to how it combines multiple retrieval signals including term frequency, inverse document frequency and document length. Although it is relatively easy to capture how each signal contributes to the relevance scores, it is more challenging to find the best way of combining these signals since they often interact with each other in a complicated way. As a result, when deriving a retrieval function from traditional retrieval models, the choice of one implementation over the others was often made based on empirical observations rather than sound theoretical derivations. In this paper, we propose a novel way of combining retrieval signals to derive robust retrieval functions. Instead of seeking an integrated way of combining these signals into a complex mathematical retrieval function, our main idea is to prioritize the retrieval signals, apply the strongest signal first to rank documents, and then iteratively use the weaker signals to break the ties of the documents with the same scores. One unique advantage of our method is that it eliminates the need of having complicated implementation of the signals and enables a simple yet elegant way of combining the multiple signals for document ranking. Empirical results show that the proposed method can achieve comparable performance as the state of art retrieval functions over traditional TREC ad hoc retrieval collections, and can outperform them over TREC microblog retrieval collections.