Counterfactual reasoning and learning systems: the example of computational advertising

This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such predictions allow both humans and algorithms to select changes that improve both the short-term and long-term performance of such systems. This work is illustrated by experiments carried out on the ad placement system associated with the Bing search engine.

[1]  Norbert Wiener,et al.  Cybernetics. , 1948, Scientific American.

[2]  F. H. Adler Cybernetics, or Control and Communication in the Animal and the Machine. , 1949 .

[3]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[4]  H. Reichenbach,et al.  The Direction of Time , 1959 .

[5]  R. Khan,et al.  Sequential Tests of Statistical Hypotheses. , 1972 .

[6]  G. Wright,et al.  Explanation and understanding , 1971 .

[7]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[8]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[9]  James W. Friedman,et al.  Oligopoly and the theory of games , 1977 .

[10]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[11]  Roger B. Myerson,et al.  Optimal Auction Design , 1981, Math. Oper. Res..

[12]  D. A. Kenny,et al.  Correlation and Causation , 1937, Wilmott.

[13]  D. A. Kenny,et al.  Correlation and Causation. , 1982 .

[14]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[15]  D. Rubin Comment: Which Ifs Have Causal Answers , 1986 .

[16]  C. Charig,et al.  Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy. , 1986, British medical journal.

[17]  Peter W. Glynn,et al.  Likelilood ratio gradient estimation: an overview , 1987, WSC '87.

[18]  S. Stigler A Historical View of Statistical Concepts in Psychology and Educational Research , 1992, American Journal of Education.

[19]  A. Genz Numerical Computation of Multivariate Normal Probabilities , 1992 .

[20]  Alex M. Andrew,et al.  Reinforcement Learning: : An Introduction , 1998 .

[21]  L. Reichl,et al.  A Modern Course in Statistical Physics, 2nd Edition , 1998 .

[22]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[23]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[24]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[25]  E. M. Lifshitz,et al.  Course in Theoretical Physics , 2013 .

[26]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[27]  S. Morris COWLES FOUNDATION FOR RESEARCH IN ECONOMICS , 2001 .

[28]  Marek J. Druzdzel,et al.  Caveats for Causal Reasoning with Equilibrium Models , 2001, ECSQARU.

[29]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[30]  S. Lauritzen,et al.  Chain graph models and their causal interpretations , 2002 .

[31]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[32]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[33]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[34]  P. Spirtes,et al.  Causal Inference of Ambiguous Manipulations , 2004, Philosophy of Science.

[35]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[36]  Paul Milgrom,et al.  Putting Auction Theory to Work , 2004 .

[37]  J. Woodward Making Things Happen: A Theory of Causal Explanation , 2003 .

[38]  Mehryar Mohri,et al.  Multi-armed Bandit Algorithms and Empirical Evaluation , 2005, ECML.

[39]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[40]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[41]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[42]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[43]  F. Keil,et al.  Explanation and understanding , 2015 .

[44]  Csaba Szepesvári,et al.  Tuning Bandit Algorithms in Stochastic Environments , 2007, ALT.

[45]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[46]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[47]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[48]  David J. Hand,et al.  Statistical Techniques for Fraud Detection, Prevention and Assessment , 2007, NATO ASI Mining Massive Data Sets for Security.

[49]  Ron Kohavi,et al.  Responsible editor: R. Bayardo. , 2022 .

[50]  H. Varian Online Ad Auctions , 2009 .

[51]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[52]  J. Pearl Causal inference in statistics: An overview , 2009 .

[53]  Ashish Agarwal,et al.  Overlapping experiment infrastructure: more, better, faster experimentation , 2010, KDD.

[54]  Joaquin Quiñonero Candela,et al.  Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[55]  Lihong Li,et al.  Learning from Logged Implicit Exploration Data , 2010, NIPS.

[56]  D. Bergemann,et al.  Dynamic Auctions: A Survey , 2010 .

[57]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[58]  S. Athey,et al.  A Structural Model of Sponsored Search Advertising Auctions , 2011 .

[59]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[60]  R. Preston McAfee,et al.  Efficient Ranking in Sponsored Search , 2011, WINE.

[61]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[62]  Aleksandrs Slivkins,et al.  Contextual Bandits with Similarity Information , 2009, COLT.

[63]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[64]  Judea Pearl,et al.  The Do-Calculus Revisited , 2012, UAI.

[65]  John Langford,et al.  Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits , 2012, UAI.

[66]  John Shawe-Taylor,et al.  PAC-Bayesian Inequalities for Martingales , 2011, IEEE Transactions on Information Theory.

[67]  John Shawe-Taylor,et al.  PAC-Bayes-Bernstein Inequality for Martingales and its Application to Multiarmed Bandits , 2011, ICML On-line Trading of Exploration and Exploitation.

[68]  Léon Bottou,et al.  From machine learning to machine reasoning , 2011, Machine Learning.

[69]  Doina Precup,et al.  Algorithms for multi-armed bandit problems , 2014, ArXiv.

[70]  P. Glynn LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .