论文信息 - Offline Evaluation and Optimization for Interactive Systems

Offline Evaluation and Optimization for Interactive Systems

Evaluating and optimizing an interactive system (like search engines, recommender and advertising systems) from historical data against a predefined online metric is challenging, especially when that metric is computed from user feedback such as clicks and payments. The key challenge is counterfactual in nature: we only observe a user's feedback for actions taken by the system, but we do not know what that user would have reacted to a different action. The golden standard to evaluate such metrics of a user-interacting system is online A/B experiments (a.k.a. randomized controlled experiments), which can be expensive in terms of both time and engineering resources. Offline evaluation/optimization (sometimes referred to as off-policy learning in the literature) thus becomes critical, aiming to evaluate the same metrics without running (many) expensive A/B experiments on live users. One approach to offline evaluation is to build a user model that simulates user behavior (clicks, purchases, etc.) under various contexts, and then evaluate metrics of a system with this simulator. While being straightforward and common in practice, the reliability of such model-based approaches relies heavily on how well the user model is built. Furthermore, it is often difficult to know a priori whether a user model is good enough to be trustable. Recent years have seen a growing interest in another solution to the offline evaluation problem. Using statistical techniques like importance sampling and doubly robust estimation, the approach can give unbiased estimates of metrics for a wide range of problems. It enjoys other benefits as well. For example, it often allows data scientists to obtain a confidence interval for the estimate to quantify the amount of uncertainty; it does not require building user models, so is more robust and easier to apply. All these benefits make the approach particularly attractive to a wide range of problems. Successful applications have been reported in the last few years by some of the industrial leaders. This tutorial gives a review of the basic theory and representative techniques. Applications of these techniques are illustrated through several case studies done at Microsoft and Yahoo!.

Lihong Li | Lihong Li

[1] Tapas Kanungo,et al. Model characterization curves for federated search using click-logs: predicting user engagement metrics for the span of feasible operating points , 2011, WWW.

[2] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[3] D. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[4] J. Heckman. Sample selection bias as a specification error , 1979 .

[5] Deepak Agarwal,et al. Personalized click shaping through lagrangian duality for online recommendation , 2012, SIGIR '12.

[6] Liang Zhang,et al. Activity ranking in LinkedIn feed , 2014, KDD.

[7] John Langford,et al. Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits , 2012, UAI.

[8] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[9] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[10] John Langford,et al. Exploration scavenging , 2008, ICML '08.

[11] Diane Lambert,et al. More bang for their bucks: assessing new features for online advertisers , 2007, SKDD.