Visual interactive failure analysis: supporting users in information retrieval evaluation

Measuring is a key to scientific progress. This is particularly true for research concerning complex systems, whether natural or human-built. Multilingual and multimedia information access systems, such as search engines, are increasingly complex: they need to satisfy diverse user needs and support challenging tasks. Their development calls for proper evaluation methodologies to ensure that they meet the expected user requirements and provide the desired effectiveness. In this context, failure analysis is crucial to understand the behaviour of complex systems. Unfortunately, this is an especially challenging activity, requiring vast amounts of human effort to inspect query-by-query the output of a system in order to understand what went well or bad. It is therefore fundamental to provide automated tools to examine system behaviour, both visually and analytically. Moreover, once you understand the reason behind a failure, you still need to conduct a "what-if" analysis to understand what among the different possible solutions is most promising and effective before actually starting to modify your system. This paper provides an analytical model for examining performances of IR systems, based on the discounted cumulative gain family of metrics, and visualization for interacting and exploring the performances of the system under examination. Moreover, we propose machine learning approach to learn the ranking model of the examined system in order to be able to conduct a "what-if" analysis and visually explore what can happen if you adopt a given solution before having to actually implement it.

[1]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[2]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[3]  Susan Brewer,et al.  Information storage and retrieval , 1959, ACM '59.

[4]  Eero Sormunen,et al.  Query performance analyser -: a web-based tool for IR research and instruction , 2002, SIGIR '02.

[5]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[6]  Robert G. Grossman,et al.  Storage and Retrieval. , 1976 .

[7]  Donna K. Harman,et al.  Overview of the Reliable Information Access Workshop , 2009, Information Retrieval.

[8]  Ben Shneiderman,et al.  A Rank-by-Feature Framework for Unsupervised Multidimensional Data Exploration Using Low Dimensional Projections , 2004 .

[9]  Ben Shneiderman,et al.  A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data , 2005, Inf. Vis..

[10]  Ellen M. Voorhees,et al.  Overview of the seventh text retrieval conference (trec-7) [on-line] , 1999 .

[11]  Ellen M. Voorhees,et al.  Overview of the Seventh Text REtrieval Conference , 1998 .

[12]  Howard D. Wactlar,et al.  Constant density displays using diversity sampling , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[13]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[14]  Ellen M. Voorhees,et al.  The seventh text REtrieval conference (TREC-7) , 1999 .

[15]  Tao Qin,et al.  Feature selection for ranking , 2007, SIGIR.

[16]  Jaana Kekäläinen,et al.  Intuition-supporting visualization of user's performance based on explicit negative higher-order relevance , 2008, SIGIR '08.

[17]  Paul Over,et al.  Blind Men and Elephants: Six Approaches to TREC data , 1999, Information Retrieval.

[18]  Susan T. Dumais,et al.  Potential for personalization , 2010, TCHI.

[19]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .