Whole-session evaluation of interactive information retrieval systems Compilation of Homework

This paper points out a number of major challenges in the evaluation of Interactive IR. The main problems identified with current approaches include: (i) user/task models are not adequately captured, (ii) information continually changes over time, (iii) IIR tasks are often very complex and thus hard to model as they evolve, and may not have fixed endpoints, and (iv) IIR often occurs over time and across sessions. While the paper doesn't provide any concrete solutions to these problems, the most promising suggestion is the use of a "living laboratory". The development of such a living lab that is open to researchers would certainly provide a number of ways to evaluate users in the wild-over coming some of the pragmatic problems typically associated with evaluation. These two works suggest that we should focus on the sequence in which users experience, encounter and process information. Bookstein tries to model the retrieval process as a sequence in order to develop a better retrieval system (and is perhaps a precursor to the Interactive Probability Ranking Principle). On the other hand, Tague-Sutclifee tries to measure the informativeness of the process (where informativeness is akin to the novelty and diversity measures being developed). Key in these works is the focus on the order in which the users examine documents. This paper provides a novel an potentially interesting solution to evaluation across a session. In some respects this work blends developments in HCI with IR. Specifically taking a GOMS like approach by Card and Moran along with Dunlop's work on time, relevance and interaction modelling to produce a "probabilistic GOMS" for IR where the main actions in the search process are assigned a time, and a probability is assigned to these actions. This provides an interesting way to examine and explore a range of potential interactions with the system-as a way to cater for the variety of ways that users interact with systems. In terms of evaluating the whole session, I have been particularly interested in developing measures that examine how well a user uses an application. The fundamental idea is that, what should be evaluated is the sequence of interactions and documents that the user examines and inspects during the process (i.e. following on from Bookstein and Tague-Sutcliffee, along with Norman's idea of the user experience is defined by the sequence of interactions.) The experience, whether it be, engagement, utility, fun, etc. at any particular point …

[1]  Doug Downey,et al.  Understanding the relationship between searchers' queries and information goals , 2008, CIKM '08.

[2]  Noriko Kando,et al.  Using Concept Map to Evaluate Learning by Searching , 2012, CogSci.

[3]  Robert Villa,et al.  Interaction Pool: Towards a user-centred test collection , 2007 .

[4]  Susan T. Dumais,et al.  Evaluation Challenges and Directions for Information-Seeking Support Systems , 2009, Computer.

[5]  Arjen P. de Vries,et al.  Supporting children's web search in school environments , 2012, IIiX.

[6]  Jacek Gwizdka,et al.  Distribution of cognitive load in Web search , 2010, J. Assoc. Inf. Sci. Technol..

[7]  Charles L. A. Clarke,et al.  Time-based calibration of effectiveness measures , 2012, SIGIR '12.

[8]  Yvonne Kammerer,et al.  Signpost from the masses: learning effects in an exploratory social tag search browser , 2009, CHI.

[9]  J. Schneider,et al.  The Janus Faced Scholar: A Festschrift in Honour of Peter Ingwersen , 2010 .

[10]  Kalervo Järvelin,et al.  Time drives interaction: simulating sessions in diverse searching environments , 2012, SIGIR '12.

[11]  Morten Hertzum,et al.  Browsing and querying in online documentation: a study of user interfaces and the interaction process , 1996, TCHI.

[12]  Kalervo Järvelin Interactive relevance feedback with graded relevance and sentence extraction: simulated user experiments , 2009, CIKM.

[13]  Young-In Song,et al.  Click the search button and be happy: evaluating direct and immediate information access , 2011, CIKM '11.

[14]  Yiming Yang,et al.  Modeling Expected Utility of Multi-session Information Distillation , 2009, ICTIR.

[15]  Pertti Vakkari,et al.  Exploratory Searching As Conceptual Exploration , 2010 .

[16]  Noriko Kando,et al.  A method to capture information encountering embedded in exploratory Web searches , 2011, Inf. Res..

[17]  Balder ten Cate,et al.  Question Answering: From Partitions to Prolog , 2002, TABLEAUX.

[18]  Wei Chu,et al.  Modeling the impact of short- and long-term behavior on search personalization , 2012, SIGIR '12.

[19]  Information Interaction in Context: 2012, IIix'12, Nijmegen, The Netherlands, August 21-24, 2012 , 2012, IIix.

[20]  Florian Metze,et al.  Spoken Web Search , 2011, MediaEval.

[21]  Leif Azzopardi,et al.  The economics in interactive information retrieval , 2011, SIGIR.

[22]  Benjamin S. Bloom,et al.  A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom's Taxonomy of Educational Objectives , 2000 .

[23]  Joseph Y. Halpern,et al.  Knowledge, probability, and adversaries , 1989, PODC '89.

[24]  Norbert Fuhr,et al.  A probability ranking principle for interactive information retrieval , 2008, Information Retrieval.

[25]  Hideo Joho,et al.  Constraint Can Affect Human Perception, Behaviour, and Performance of Search , 2012, ICADL.

[26]  R. Wilson On the evaluation of , 1940 .

[27]  Noriko Kando,et al.  Differences between informational and transactional tasks in information seeking on the web , 2008, IIiX.

[28]  J. R. vanOssenbruggen,et al.  Implicit relevance feedback from a multi-step search process: a use of query-logs , 2011 .

[29]  Kalervo Järvelin,et al.  Information interaction in molecular medicine: integrated use of multiple channels , 2010, IIiX.

[30]  Lois M. L. Delcambre,et al.  Discounted Cumulated Gain Based Evaluation of Multiple-Query IR Sessions , 2008, ECIR.

[31]  Rosie Jones,et al.  Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs , 2008, CIKM '08.

[32]  FuhrNorbert A probability ranking principle for interactive information retrieval , 2008 .

[33]  Nicholas J. Belkin,et al.  Report on the SIGIR workshop on "entertain me": supporting complex search tasks , 2012, SIGF.

[34]  Nicholas J. Belkin,et al.  On the evaluation of interactive information retrieval systems , 2010 .

[35]  Susan T. Dumais,et al.  Evaluating implicit measures to improve the search experiences , 2003 .

[36]  Arjen P. de Vries,et al.  Explaining Query Modifications - An Alternative Interpretation of Term Addition and Removal , 2012, ECIR.

[37]  Noriko Kando,et al.  Connecting Qualitative and Quantitative Analysis of Web Search Process: Analysis Using Search Units , 2010, AIRS.

[38]  Pia Borlund,et al.  The IIR evaluation model: a framework for evaluation of interactive information retrieval systems , 2003, Inf. Res..

[39]  Noriko Kando,et al.  Using a concept map to evaluate exploratory search , 2010, IIiX.

[40]  Abigail Sellen,et al.  "It's simply integral to what I do": enquiries into how the web is weaved into everyday life , 2012, WWW.

[41]  Ryen W. White,et al.  Modeling and analysis of cross-session search tasks , 2011, SIGIR.

[42]  Joseph Y. Halpern,et al.  Reasoning about knowledge: a survey , 1995 .

[43]  Barteld P. Kooi,et al.  Probabilistic Dynamic Epistemic Logic , 2003, J. Log. Lang. Inf..

[44]  J. Liu,et al.  Usefulness as the Criterion for Evaluation of Interactive Information Retrieval , 2009 .

[45]  Barbara M. Wildemuth,et al.  The effects of domain knowledge on search tactic formulation , 2004, J. Assoc. Inf. Sci. Technol..

[46]  Ben Carterette,et al.  Overview of the TREC 2011 Session Track , 2011, TREC.

[47]  Benjamin S. Bloom,et al.  Taxonomy of Educational Objectives: The Classification of Educational Goals. , 1957 .

[48]  Jean Tague-Sutcliffe Measuring the informativeness of a retrieval process , 1992, SIGIR '92.

[49]  Nicholas J. Belkin,et al.  "entertain me" Supporting Complex Search Tasks , 2011 .

[50]  Fabio Crestani,et al.  The Troubles with Using a Logical Model of IR on a Large Collection of Documents , 1995, TREC.

[51]  Stephen E. Robertson,et al.  On the Evaluation of IR Systems , 1992, Inf. Process. Manag..

[52]  Leif Azzopardi Usage based effectiveness measures: monitoring application performance in information retrieval , 2009, CIKM.

[53]  Max L. Wilson,et al.  A comparison of techniques for measuring sensemaking and learning within participant-generated summaries , 2013, J. Assoc. Inf. Sci. Technol..

[54]  Abraham Bookstein,et al.  Information retrieval: A sequential learning process , 1983, J. Am. Soc. Inf. Sci..

[55]  Norbert Fuhr,et al.  Using eye-tracking with dynamic areas of interest for analyzing interactive information retrieval , 2012, SIGIR '12.

[56]  Arjen P. de Vries,et al.  Semantic search log analysis: A method and a study on professional image search , 2011, J. Assoc. Inf. Sci. Technol..

[57]  Ben Carterette,et al.  Adapting Query Expansion to Search Proficiency , 2012 .

[58]  Diane Kelly,et al.  Methods for Evaluating Interactive Information Retrieval Systems with Users , 2009, Found. Trends Inf. Retr..

[59]  Franz Baader,et al.  A Description Logic Based Approach to Reasoning about Web Services , 2005, WWW 2005.

[60]  Maarten Marx,et al.  Evaluation Methods for Rankings of Facetvalues for Faceted Search , 2011, CLEF.

[61]  Noriko Kando,et al.  Evaluation of Interactive Information Access System using Concept Map , 2011, EVIA@NTCIR.