MM: A new Framework for Multidimensional Evaluation of Search Engines

In this paper, we proposed a framework to evaluate information retrieval systems in presence of multidimensional relevance. This is an important problem in tasks such as consumer health search, where the understandability and trustworthiness of information greatly influence people's decisions based on the search engine results, but common topicality-only evaluation measures ignore these aspects. We used synthetic and real data to compare our proposed framework, named MM, to the understandability-biased information evaluation (UBIRE), an existing framework used in the context of consumer health search. We showed how the proposed approach diverges from the UBIRE framework, and how MM can be used to better understand the trade-offs between topical relevance and the other relevance dimensions.

[1]  Gareth J. F. Jones,et al.  ShARe/CLEF eHealth Evaluation Lab 2014, Task 3: User-centred Health Information Retrieval , 2014, CLEF.

[2]  Allan Hanbury,et al.  CLEF 2017 Task Overview: The IR Task at the eHealth Evaluation Lab - Evaluating Retrieval Methods for Consumer Health Search , 2017, CLEF.

[3]  Guido Zuccon,et al.  Understandability Biased Evaluation for Information Retrieval , 2016, ECIR.

[4]  Guido Zuccon,et al.  The IR Task at the CLEF eHealth Evaluation Lab 2016: User-centred Health Information Retrieval , 2016, CLEF.

[5]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[6]  Linda A. Watson,et al.  Information Retrieval: A Health and Biomedical Perspective. , 2005 .

[7]  Linda Schamber Relevance and Information Behavior. , 1994 .

[8]  Ben Carterette,et al.  System effectiveness, user models, and user utility: a conceptual framework for investigation , 2011, SIGIR.

[9]  T. Park The Nature of Relevance in Information Retrieval: An Empirical Study , 1993, The Library Quarterly.

[10]  William Hersh,et al.  Comprar Information Retrieval: A Health And Biomedical Perspective | Hersh, William | 9780387787022 | Springer , 2009 .

[11]  Guido Zuccon,et al.  Integrating Understandability in the Evaluation of Consumer Health Search Engines , 2014, MedIR@SIGIR.

[12]  Ryen W. White,et al.  Time-critical search , 2014, SIGIR.

[13]  William R. Hersh Health and Biomedical Information , 2009 .

[14]  Alistair Moffat,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[15]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[16]  Pia Borlund,et al.  The concept of relevance in IR , 2003, J. Assoc. Inf. Sci. Technol..