Head First: Living Labs for Ad-hoc Search Evaluation

The information retrieval (IR) community strives to make evaluation more centered on real users and their needs. The living labs evaluation paradigm, i.e., observing users in their natural task environments, offers great promise in this regard. Yet, progress in an academic setting has been limited. This paper presents the first living labs for the IR community benchmarking campaign initiative, taking as test two use-cases: local domain search on a university website and product search on an e-commerce site. There are many challenges associated with this setting, including incorporating results from experimental search systems into live production systems, and obtaining sufficiently many impressions from relatively low traffic sites. We propose that head queries can be used to generate result lists offline, which are then interleaved with results of the production system for live evaluation. An API is developed to orchestrate the communication between commercial parties and benchmark participants. This campaign acts to progress the living labs for IR evaluation methodology, and offers important insight into the role of living labs in this space.

[1]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[2]  James Allan,et al.  Frontiers, challenges, and opportunities for information retrieval: Report from SWIRL 2012 the second strategic workshop on information retrieval in Lorne , 2012, SIGF.

[3]  Filip Radlinski,et al.  Large-scale validation and analysis of interleaved search evaluation , 2012, TOIS.

[4]  Leif Azzopardi,et al.  The economics in interactive information retrieval , 2011, SIGIR.

[5]  Mark D. Smucker,et al.  Report on the CIKM workshop on living labs for information retrieval evaluation , 2014, SIGF.

[6]  Ron Kohavi,et al.  Responsible editor: R. Bayardo. , 2022 .

[7]  Gareth J. F. Jones,et al.  Evaluating Personal Information Retrieval , 2012, ECIR.

[8]  Amanda Spink,et al.  Searching the Web: the public and their queries , 2001 .

[9]  Susan T. Dumais,et al.  Evaluation Challenges and Directions for Information-Seeking Support Systems , 2009, Computer.

[10]  Cyril W. Cleverdon,et al.  Factors determining the performance of indexing systems , 1966 .

[11]  Ben Carterette,et al.  System effectiveness, user models, and user utility: a conceptual framework for investigation , 2011, SIGIR.

[12]  Filip Radlinski,et al.  How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.

[13]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[14]  Krisztian Balog,et al.  Towards a Living Lab for Information Retrieval Research and Development - A Proposal for a Living Lab for Product Search Tasks , 2011, CLEF.

[15]  Jaana Kekäläinen,et al.  Expected reading effort in focused retrieval evaluation , 2010, Information Retrieval.

[16]  Krisztian Balog,et al.  Towards a living lab for information retrieval research and development: a proposal for a living lab for product search tasks , 2011 .

[17]  Omar Alonso,et al.  Crowdsourcing for relevance evaluation , 2008, SIGF.

[18]  Carol Peters,et al.  Report on the SIGIR 2009 workshop on the future of IR evaluation , 2009, SIGF.

[19]  Michael Keen,et al.  ASLIB CRANFIELD RESEARCH PROJECT FACTORS DETERMINING THE PERFORMANCE OF INDEXING SYSTEMS VOLUME 2 , 1966 .