Understandability Biased Evaluation for Information Retrieval

Although relevance is known to be a multidimensional concept, information retrieval measures mainly consider one dimension of relevance: topicality. In this paper we propose a method to integrate multiple dimensions of relevance in the evaluation of information retrieval systems. This is done within the gain-discount evaluation framework, which underlies measures like rank-biased precision (RBP), cumulative gain, and expected reciprocal rank. Albeit the proposal is general and applicable to any dimension of relevance, we study specific instantiations of the approach in the context of evaluating retrieval systems with respect to both the topicality and the understandability of retrieved documents. This leads to the formulation of understandability biased evaluation measures based on RBP. We study these measures using both simulated experiments and real human assessments. The findings show that considering both understandability and topicality in the evaluation of retrieval systems leads to claims about system effectiveness that differ from those obtained when considering topicality alone.

[1]  Stephen E. Robertson,et al.  A new rank correlation coefficient for information retrieval , 2008, SIGIR '08.

[2]  Jin Zhang,et al.  Multidimensional relevance modeling via psychometrics and crowdsourcing , 2014, SIGIR.

[3]  Kevyn Collins-Thompson,et al.  Predicting reading difficulty with statistical language models , 2005, J. Assoc. Inf. Sci. Technol..

[4]  Stefano Mizzaro Relevance: the whole history , 1997 .

[5]  Patrik Larsson,et al.  Classification into Readability Levels : Implementation and Evaluation , 2006 .

[6]  C. A. Cuadra,et al.  OPENING THE BLACK BOX OF ‘RELEVANCE’ , 1967 .

[7]  Alistair Moffat,et al.  Click-based evidence for decaying weight distributions in search effectiveness metrics , 2010, Information Retrieval.

[8]  Ben Carterette,et al.  System effectiveness, user models, and user utility: a conceptual framework for investigation , 2011, SIGIR.

[9]  D B Friedman,et al.  Health literacy and the World Wide Web: Comparing the readability of leading incident cancers on the Internet , 2006, Medical informatics and the Internet in medicine.

[10]  Sanna Salanterä,et al.  ShARe/CLEF eHealth Evaluation Lab 2013, Task 3: Information Retrieval to Address Patients' Questions when Reading Clinical Reports , 2013, CLEF.

[11]  Allan Hanbury,et al.  The Influence of Pre-processing on the Estimation of Readability of Web Documents , 2015, CIKM.

[12]  Charles L. A. Clarke,et al.  A comparative analysis of cascade measures for novelty and diversity , 2011, WSDM '11.

[13]  Tefko Saracevic,et al.  The Stratified Model of Information Retrieval Interaction: Extension and Applications , 1997 .

[14]  Yunjie Calvin Xu,et al.  Relevance judgment: What do information users consider beyond topicality? , 2006, J. Assoc. Inf. Sci. Technol..

[15]  Guido Zuccon,et al.  Integrating Understandability in the Evaluation of Consumer Health Search Engines , 2014, MedIR@SIGIR.

[16]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[17]  James L. Peterson,et al.  Computer-based readability indexes , 1982, ACM '82.

[18]  Guido Zuccon,et al.  Modelling the information seeking user by the decisions they make , 2013, SIGIR 2013.

[19]  S. J. Sullivan,et al.  Concussion information online: evaluation of information quality, content and readability of concussion-related websites , 2011, British Journal of Sports Medicine.

[20]  Carol L. Barry User-defined relevance criteria: an exploratory study , 1994 .

[21]  R. Wiener,et al.  Literacy, Pregnancy and Potential Oral Health Changes: The Internet and Readability Levels , 2014, Maternal and Child Health Journal.

[22]  Tetsuya Sakai,et al.  Evaluating diversified search results using per-intent graded relevance , 2011, SIGIR.

[23]  Xue Li,et al.  Concept-based document readability in domain specific information retrieval , 2006, CIKM '06.

[24]  Michael B. Eisenberg,et al.  Relevance: The Search for a Definition. , 1988 .

[25]  T. Volsko,et al.  Readability assessment of internet-based consumer health information. , 2008, Respiratory care.

[26]  Alistair Moffat,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[27]  Gareth J. F. Jones,et al.  ShARe/CLEF eHealth Evaluation Lab 2014, Task 3: User-centred Health Information Retrieval , 2014, CLEF.

[28]  Carol L. Barry,et al.  Order Effects: A Study of the Possible Influence of Presentation Order on User Judgments of Document Relevance. , 1988 .

[29]  Guido Zuccon,et al.  Diagnose This If You Can - On the Effectiveness of Search Engines in Finding Medical Self-diagnosis Information , 2015, ECIR.

[30]  S. Robertson The probability ranking principle in IR , 1997 .

[31]  Peter Ingwersen,et al.  Dimensions of relevance , 2000, Inf. Process. Manag..

[32]  Gareth J. F. Jones,et al.  CLEF eHealth Evaluation Lab 2015, Task 2: Retrieving Information About Medical Symptoms , 2015, CLEF.

[33]  Gareth J. F. Jones,et al.  Task 3 : User-centred health information retrieval : , 2014 .

[34]  John D. Lafferty,et al.  Beyond independent relevance: methods and evaluation metrics for subtopic retrieval , 2003, SIGIR.

[35]  Charles L. A. Clarke,et al.  Time-based calibration of effectiveness measures , 2012, SIGIR '12.

[36]  Douglas G. Schultz,et al.  A Field Experimental Approach to the Study of Relevance Assessments in Relation to Document Searching. Final Report to the National Science Foundation. Volume II, Appendices. , 1967 .