Comparing both relevance and robustness in selection of web ranking functions

In commercial search engines, a ranking function is selected for deployment mainly by comparing the relevance measurements over candidates. In this paper we suggest to select Web ranking functions according to both their relevance and robustness to the changes that may lead to relevance degradation over time. We argue that the ranking robustness can be effectively measured by taking into account the ranking score distribution across Web pages. We then improve NDCG with two new metrics and show their superiority in terms of stability to ranking score turbulence and stability in function selection.