OpenSearch: Lessons Learned from an Online Evaluation Campaign

We report on our experience with TREC OpenSearch, an online evaluation campaign that enabled researchers to evaluate their experimental retrieval methods using real users of a live website. Specifically, we focus on the task of ad hoc document retrieval within the academic search domain, and work with two search engines, CiteSeerX and SSOAR, that provide us with traffic. We describe our experimental platform, which is based on the living labs methodology, and report on the experimental results obtained. We also share our experiences, challenges, and the lessons learned from running this track in 2016 and 2017.

[1]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[2]  Cyril W. Cleverdon,et al.  The significance of the Cranfield tests on index languages , 1991, SIGIR '91.

[3]  Krisztian Balog,et al.  Overview of the TREC 2016 Open Search track Academic Search Edition , 2016 .

[4]  Krisztian Balog,et al.  Towards a living lab for information retrieval research and development: a proposal for a living lab for product search tasks , 2011 .

[5]  Martha Larson,et al.  Benchmarking News Recommendations: The CLEF NewsREEL Use Case , 2016, SIGF.

[6]  Ron Kohavi,et al.  Online controlled experiments at large scale , 2013, KDD.

[7]  Ryen W. White,et al.  Evaluating implicit feedback models using searcher simulations , 2005, TOIS.

[8]  Filip Radlinski,et al.  Large-scale validation and analysis of interleaved search evaluation , 2012, TOIS.

[9]  Ron Kohavi,et al.  Controlled experiments on the web: survey and practical guide , 2009, Data Mining and Knowledge Discovery.

[10]  Filip Radlinski,et al.  How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.

[11]  M. de Rijke,et al.  Multileaved Comparisons for Fast Online Evaluation , 2014, CIKM.

[12]  M. de Rijke,et al.  Evaluating Personal Assistants on Mobile devices , 2017, ArXiv.

[13]  Krisztian Balog,et al.  Extended Overview of the Living Labs for Information Retrieval Evaluation (LL4IR) CLEF Lab 2015 , 2015, CLEF.

[14]  Takehiro Yamamoto,et al.  Challenges of Multileaved Comparison in Practice: Lessons from NTCIR-13 OpenLiveQ Task , 2018, CIKM.

[15]  Diane Kelly,et al.  Methods for Evaluating Interactive Information Retrieval Systems with Users , 2009, Found. Trends Inf. Retr..

[16]  Filip Radlinski,et al.  Online Evaluation for Information Retrieval , 2016, Found. Trends Inf. Retr..

[17]  Cornelia Caragea,et al.  CiteSeerX: AI in a Digital Library Search Engine , 2014, AI Mag..

[18]  Krisztian Balog,et al.  Head First: Living Labs for Ad-hoc Search Evaluation , 2014, CIKM.