On the informativeness of cascade and intent-aware effectiveness measures

The Maximum Entropy Method provides one technique for validating search engine effectiveness measures. Under this method, the value of an effectiveness measure is used as a constraint to estimate the most likely distribution of relevant documents under a maximum entropy assumption. This inferred distribution may then be compared to the actual distribution to quantify the "informativeness" of the measure. The inferred distribution may also be used to estimate values for other effectiveness measures. Previous work focused on traditional effectiveness measures, such as average precision. In this paper, we extend the Maximum Entropy Method to the newer cascade and intent-aware effectiveness measures by considering the dependency of the documents ranked in a results list. These measures are intended to reflect the novelty and diversity of search results in addition to the traditional relevance. Our results indicate that intent-aware measures based on the cascade model are informative in terms of both inferring actual distribution and predicting the values of other retrieval measures.

[1]  Mark Sanderson,et al.  Information retrieval system evaluation: effort, sensitivity, and reliability , 2005, SIGIR '05.

[2]  Alistair Moffat,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[3]  Charles L. A. Clarke,et al.  A comparative analysis of cascade measures for novelty and diversity , 2011, WSDM '11.

[4]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[5]  Milad Shokouhi,et al.  Incorporating User Behavior Information in IR Evaluation , 2009, UIIR@SIGIR.

[6]  Stephen E. Robertson,et al.  On the choice of effectiveness measures for learning to rank , 2010, Information Retrieval.

[7]  Stephen E. Robertson,et al.  Extending average precision to graded relevance judgments , 2010, SIGIR.

[8]  Tetsuya Sakai,et al.  Evaluating evaluation metrics based on the bootstrap , 2006, SIGIR.

[9]  Filip Radlinski,et al.  Metrics for assessing sets of subtopics , 2010, SIGIR '10.

[10]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[11]  Tetsuya Sakai,et al.  On the Robustness of Information Retrieval Metrics to Biased Relevance Assessments , 2009, J. Inf. Process..

[12]  M. J. D. Powell,et al.  A fast algorithm for nonlinearly constrained optimization calculations , 1978 .

[13]  Alistair Moffat,et al.  Click-based evidence for decaying weight distributions in search effectiveness metrics , 2010, Information Retrieval.

[14]  Ellen M. Voorhees,et al.  Evaluating evaluation measure stability , 2000, SIGIR '00.

[15]  Charles L. A. Clarke,et al.  An Effectiveness Measure for Ambiguous and Underspecified Queries , 2009, ICTIR.

[16]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[17]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[18]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[19]  Emine Yilmaz,et al.  The maximum entropy method for analyzing retrieval measures , 2005, SIGIR '05.

[20]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.