Evaluation Measures for Relevance and Credibility in Ranked Lists

Recent discussions on alternative facts, fake news, and post truth politics have motivated research on creating technologies that allow people not only to access information, but also to assess the credibility of the information presented to them by information retrieval systems. Whereas technology is in place for filtering information according to relevance and/or credibility, no single measure currently exists for evaluating the accuracy or precision (and more generally effectiveness) of both the relevance and the credibility of retrieved results. One obvious way of doing so is to measure relevance and credibility effectiveness separately, and then consolidate the two measures into one. There at least two problems with such an approach: (I) it is not certain that the same criteria are applied to the evaluation of both relevance and credibility (and applying different criteria introduces bias to the evaluation); (II) many more and richer measures exist for assessing relevance effectiveness than for assessing credibility effectiveness (hence risking further bias). Motivated by the above, we present two novel types of evaluation measures that are designed to measure the effectiveness of both relevance and credibility in ranked lists of retrieval results. Experimental evaluation on a small human-annotated dataset (that we make freely available to the research community) shows that our measures are expressive and intuitive in their interpretation.

[1]  Milad Shokouhi,et al.  Expected browsing utility for web search evaluation , 2010, CIKM.

[2]  Ingemar J. Cox,et al.  Multi-Dueling Bandits and Their Application to Online Ranker Evaluation , 2016, CIKM.

[3]  Alexander F. Gelbukh,et al.  Using Factual Density to Measure Informativeness of Web Documents , 2013, NODALIDA.

[4]  Sergei Vassilvitskii,et al.  Generalized distances between rankings , 2010, WWW '10.

[5]  Subbarao Kambhampati,et al.  SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement , 2011, WWW.

[6]  M. de Rijke,et al.  Click model-based information retrieval metrics , 2013, SIGIR.

[7]  David Dawson,et al.  Wiley Handbook of Science and Technology for Homeland Security , 2011 .

[8]  Karl Aberer,et al.  CredibleWeb: a platform for web credibility evaluation , 2013, CHI Extended Abstracts.

[9]  Christina Lioma,et al.  A study of factuality, objectivity and relevance: three desiderata in large-scale information retrieval? , 2016, BDCAT.

[10]  Stefano Mizzaro,et al.  The Good, the Bad, the Difficult, and the Easy: Something Wrong with Information Retrieval Evaluation? , 2008, ECIR.

[11]  Seungwoo Kang,et al.  NewsCube: delivering multiple aspects of news to mitigate media bias , 2009, CHI.

[12]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[13]  Karen A. Scarfone,et al.  Cyber Security Metrics and Measures , 2008 .

[14]  Barbara Rosario,et al.  What is disputed on the web? , 2010, WICOW '10.

[15]  Ellen Riloff,et al.  Finding Mutual Benefit between Subjectivity Analysis and Information Extraction , 2011, IEEE Transactions on Affective Computing.

[16]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[17]  Filip Radlinski,et al.  Online Evaluation for Information Retrieval , 2016, Found. Trends Inf. Retr..

[18]  Filip Radlinski,et al.  Predicting Search Satisfaction Metrics with Interleaved Comparisons , 2015, SIGIR.

[19]  Ingemar J. Cox,et al.  An Improved Multileaving Algorithm for Online Ranker Evaluation , 2016, SIGIR.

[20]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[21]  William M. Shaw,et al.  On the foundation of evaluation , 1986, J. Am. Soc. Inf. Sci..

[22]  Scott Counts,et al.  Tweeting is believing?: understanding microblog credibility perceptions , 2012, CSCW.

[23]  M. de Rijke,et al.  Multileave Gradient Descent for Fast Online Learning to Rank , 2016, WSDM.

[24]  Horst Bischof,et al.  Assessing the Quality of Web Content , 2014, ArXiv.

[25]  Meredith Ringel Morris,et al.  Augmenting web pages and search results to support credibility assessment , 2011, CHI.