Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement

Online metrics measured through A/B tests have become the gold standard for many evaluation questions. But can we get the same results as A/B tests without actually fielding a new system? And can we train systems to optimize online metrics without subjecting users to an online learning algorithm? This tutorial summarizes and unifies the emerging body of methods on counterfactual evaluation and learning. These counterfactual techniques provide a well-founded way to evaluate and optimize online metrics by exploiting logs of past user interactions. In particular, the tutorial unifies the causal inference, information retrieval, and machine learning view of this problem, providing the basis for future research in this emerging area of great potential impact. Supplementary material and resources are available online at http://www.cs.cornell.edu/~adith/CfactSIGIR2016.

[1]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[2]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[3]  Thorsten Joachims,et al.  Counterfactual Risk Minimization , 2015, ICML.

[4]  Lihong Li,et al.  Toward Minimax Off-policy Value Estimation , 2015, AISTATS.

[5]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[6]  Olivier Nicol,et al.  Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques , 2014, ICML.

[7]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[8]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[9]  John Langford,et al.  Doubly Robust Policy Evaluation and Optimization , 2014, ArXiv.

[10]  Reuven Y. Rubinstein,et al.  Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[11]  Ben Carterette,et al.  Advances on the development of evaluation measures , 2012, SIGIR '12.

[12]  Ben Carterette,et al.  Reusable test collections through experimental design , 2010, SIGIR.

[13]  Katja Hofmann,et al.  Reusing historical interaction data for faster online learning to rank for IR , 2013, DIR.

[14]  Lihong Li,et al.  Learning from Logged Implicit Exploration Data , 2010, NIPS.

[15]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[16]  Lihong Li,et al.  Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study , 2015, WWW.

[17]  Thorsten Joachims,et al.  The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.

[18]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  John Langford,et al.  Exploration scavenging , 2008, ICML '08.

[21]  Dirk P. Kroese,et al.  Simulation and the Monte Carlo Method (Wiley Series in Probability and Statistics) , 1981 .

[22]  John Langford,et al.  Off-policy evaluation for slate recommendation , 2016, NIPS.

[23]  Thorsten Joachims,et al.  Recommendations as Treatments: Debiasing Learning and Evaluation , 2016, ICML.

[24]  Emine Yilmaz,et al.  A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.

[25]  Thorsten Joachims,et al.  Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.

[26]  John Langford,et al.  The offset tree for learning with partial labels , 2008, KDD.

[27]  Ron Kohavi,et al.  Responsible editor: R. Bayardo. , 2022 .