Investigating the exhaustivity dimension in content-oriented XML element retrieval evaluation

INEX, the evaluation initiative for content-oriented XML retrieval, has since its establishment defined the relevance of an element according to two graded dimensions, exhaustivity and specificity. The former measures how exhaustively an XML element discusses the topic of request, whereas specificity measures how focused the element is on the topic of request. The reason for having two dimensions was to provide a more stable measure of relevance than if assessors were asked to rate the relevance of an element on a single scale. However, obtaining relevance assessments is a costly task. as each document must be assessed for relevance by a human assessor. In XML retrieval this problem is exacerbated as the elements of the document must also be assessed with respect to the exhaustivity and specificity dimensions. A continuous discussion in INEX has been whether such a sophisticated definition of relevance, and in particular the exhaustivity dimension, was needed. This paper attempts to answer this question through extensive statistical tests to compare the conclusions about system performance that could be made under different assessment scenarios.

[1]  Jacques Savoy,et al.  Statistical inference in retrieval effectiveness evaluation , 1997, Inf. Process. Manag..

[2]  Andrew Trotman,et al.  Wanted : Element Retrieval Users , 2005 .

[3]  Andrew Trotman,et al.  Report on the INEX 2005 workshop on element retrieval methodology , 2005, SIGF.

[4]  Mounia Lalmas,et al.  Overview of INEX 2004 , 2004, INEX.

[5]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[6]  Mark Sanderson,et al.  Information retrieval system evaluation: effort, sensitivity, and reliability , 2005, SIGIR '05.

[7]  Gabriella Kazai,et al.  The overlap problem in content-oriented XML retrieval evaluation , 2004, SIGIR '04.

[8]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[9]  James A. Thom,et al.  HiXEval: Highlighting XML Retrieval Evaluation , 2005, INEX.

[10]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[11]  Larry Wasserman,et al.  All of Statistics , 2004 .

[12]  Vijay V. Raghavan,et al.  A critical investigation of recall and precision as measures of retrieval system performance , 1989, TOIS.

[13]  Ellen M. Voorhees,et al.  The effect of topic set size on retrieval experiment error , 2002, SIGIR '02.

[14]  Mounia Lalmas,et al.  Providing consistent and exhaustive relevance assessments for XML retrieval evaluation , 2004, CIKM '04.

[15]  Gabriella Kazai,et al.  Overview of INEX 2005 , 2005, INEX.