Online Experimentation for Information Retrieval

Online experimentation for information retrieval (IR) focuses on insights that can be gained from user interactions with IR systems, such as web search engines. The most common form of online experimentation, A/B testing, is widely used in practice, and has helped sustain continuous improvement of the current generation of these systems.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[3]  Diane Kelly,et al.  Methods for Evaluating Interactive Information Retrieval Systems with Users , 2009, Found. Trends Inf. Retr..

[4]  Milad Shokouhi,et al.  Using Clicks as Implicit Judgments: Expectations Versus Observations , 2008, ECIR.

[5]  Peter Spirtes,et al.  Introduction to Causal Inference , 2010, J. Mach. Learn. Res..

[6]  M. de Rijke,et al.  Optimizing Base Rankers Using Clicks - A Case Study Using BM25 , 2014, ECIR.

[7]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[8]  Liang Tang,et al.  Automatic ad format selection via contextual bandits , 2013, CIKM.

[9]  Filip Radlinski,et al.  Large-scale validation and analysis of interleaved search evaluation , 2012, TOIS.

[10]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[11]  Mounia Lalmas,et al.  Absence time and user engagement: evaluating ranking functions , 2013, WSDM '13.

[12]  Thorsten Joachims,et al.  Beat the Mean Bandit , 2011, ICML.

[13]  Pushmeet Kohli,et al.  A Fast Bandit Algorithm for Recommendation to Users With Heterogenous Tastes , 2013, AAAI.

[14]  Benjamin Piwowarski,et al.  Precision recall with user modeling (PRUM): Application to structured information retrieval , 2007, TOIS.

[15]  Milad Shokouhi,et al.  On correlation of absence time and search effectiveness , 2014, SIGIR.

[16]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[17]  Katja Hofmann,et al.  Estimating interleaved comparison outcomes from historical click data , 2012, CIKM '12.

[18]  Katja Hofmann,et al.  Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods , 2013, TOIS.

[19]  Rémi Munos,et al.  Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.

[20]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[21]  Earl R. Babbie,et al.  The practice of social research , 1969 .

[22]  Andreas Krause,et al.  Online Learning of Assignments , 2009, NIPS.

[23]  Filip Radlinski,et al.  How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.

[24]  Karl Gyllstrom,et al.  A comparison of query and term suggestion features for interactive searching , 2009, SIGIR.

[25]  Vanja Josifovski,et al.  Up next: retrieval methods for large scale related video suggestion , 2014, KDD.

[26]  Filip Radlinski,et al.  Comparing the sensitivity of information retrieval metrics , 2010, SIGIR.

[27]  Ron Kohavi,et al.  Online controlled experiments at large scale , 2013, KDD.

[28]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[29]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[30]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[31]  Yang Song,et al.  Evaluating and predicting user engagement change with degraded search relevance , 2013, WWW.

[32]  Lihong Li,et al.  Counterfactual Estimation and Optimization of Click Metrics for Search Engines , 2014, ArXiv.

[33]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[34]  M. de Rijke,et al.  Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem , 2013, ICML.

[35]  Ron Kohavi,et al.  Improving the sensitivity of online controlled experiments by utilizing pre-experiment data , 2013, WSDM.

[36]  M. de Rijke,et al.  Multileaved Comparisons for Fast Online Evaluation , 2014, CIKM.

[37]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[38]  Bhaskar Mitra,et al.  An Eye-tracking Study of User Interactions with Query Auto Completion , 2014, CIKM.

[39]  Gabriella Kazai,et al.  Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking , 2011, SIGIR.

[40]  Rémi Munos,et al.  Stochastic Simultaneous Optimistic Optimization , 2013, ICML.

[41]  Katja Hofmann,et al.  A probabilistic method for inferring preferences from clicks , 2011, CIKM '11.

[42]  Grace Hui Yang,et al.  Win-win search: dual-agent stochastic game in session search , 2014, SIGIR.

[43]  Edward Cutrell,et al.  An eye tracking study of the effect of target rank on web search , 2007, CHI.

[44]  Katja Hofmann,et al.  Evaluating aggregated search using interleaving , 2013, CIKM.

[45]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[46]  Thorsten Joachims,et al.  The K-armed Dueling Bandits Problem , 2012, COLT.

[47]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[48]  Thorsten Joachims,et al.  Eye-tracking analysis of user behavior in WWW search , 2004, SIGIR '04.

[49]  Jaime Teevan,et al.  Implicit feedback for inferring user preference: a bibliography , 2003, SIGF.

[50]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[51]  Yisong Yue,et al.  Linear Submodular Bandits and their Application to Diversified Retrieval , 2011, NIPS.

[52]  Image,et al.  A Unified Search Federation System Based on Online User Feedback , 2013 .

[53]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[54]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[55]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[56]  Benjamin Van Roy,et al.  An Information-Theoretic Analysis of Thompson Sampling , 2014, J. Mach. Learn. Res..

[57]  M. de Rijke,et al.  Relative confidence sampling for efficient on-line ranker evaluation , 2014, WSDM.

[58]  Thorsten Joachims,et al.  Reducing Dueling Bandits to Cardinal Bandits , 2014, ICML.

[59]  Katja Hofmann,et al.  Lerot: an online learning to rank framework , 2013, LivingLab '13.

[60]  Ryen W. White,et al.  Personalized models of search satisfaction , 2013, CIKM.

[61]  Yisong Yue,et al.  Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data , 2010, WWW '10.

[62]  Filip Radlinski,et al.  Minimally Invasive Randomization for Collecting Unbiased Preferences from Clickthrough Logs , 2006, AAAI 2006.

[63]  Katja Hofmann,et al.  Information Retrieval manuscript No. (will be inserted by the editor) Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval , 2022 .

[64]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[65]  Seung-won Hwang,et al.  Enriching Documents with Examples: A Corpus Mining Approach , 2013, TOIS.

[66]  Fernando Diaz,et al.  Adaptation of offline vertical selection predictions in the presence of user feedback , 2009, SIGIR.

[67]  Ashish Agarwal,et al.  Overlapping experiment infrastructure: more, better, faster experimentation , 2010, KDD.

[68]  Ron Kohavi,et al.  Trustworthy online controlled experiments: five puzzling outcomes explained , 2012, KDD.

[69]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[70]  Krisztian Balog,et al.  Head First: Living Labs for Ad-hoc Search Evaluation , 2014, CIKM.

[71]  Eyke Hüllermeier,et al.  A Survey of Preference-Based Online Learning with Bandit Algorithms , 2014, ALT.

[72]  Rajeev Rastogi,et al.  LogUCB: an explore-exploit algorithm for comments recommendation , 2012, CIKM '12.

[73]  Ron Kohavi,et al.  Controlled experiments on the web: survey and practical guide , 2009, Data Mining and Knowledge Discovery.

[74]  Filip Radlinski,et al.  Optimized interleaving for online retrieval evaluation , 2013, WSDM.

[75]  Ryen W. White,et al.  Modeling dwell time to predict click-level satisfaction , 2014, WSDM.

[76]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[77]  M. de Rijke,et al.  A Comparative Analysis of Interleaving Methods for Aggregated Search , 2015, TOIS.

[78]  Ben Carterette,et al.  Statistical Significance Testing in Information Retrieval: Theory and Practice , 2014, SIGIR.

[79]  Filip Radlinski,et al.  On caption bias in interleaving experiments , 2012, CIKM '12.

[80]  Jun Wang,et al.  Interactive exploratory search for multi page search results , 2013, WWW.

[81]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[82]  Kuansan Wang,et al.  PSkip: estimating relevance ranking quality from web search clickthrough data , 2009, KDD.

[83]  Katja Hofmann,et al.  Balancing Exploration and Exploitation in Learning to Rank Online , 2011, ECIR.

[84]  John Langford,et al.  Exploration scavenging , 2008, ICML '08.

[85]  Doina Precup,et al.  Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[86]  Filip Radlinski,et al.  Ranked bandits in metric spaces: learning diverse rankings over large document collections , 2013, J. Mach. Learn. Res..