Improving the sensitivity of online controlled experiments by utilizing pre-experiment data

Online controlled experiments are at the heart of making data-driven decisions at a diverse set of companies, including Amazon, eBay, Facebook, Google, Microsoft, Yahoo, and Zynga. Small differences in key metrics, on the order of fractions of a percent, may have very significant business implications. At Bing it is not uncommon to see experiments that impact annual revenue by millions of dollars, even tens of millions of dollars, either positively or negatively. With thousands of experiments being run annually, improving the sensitivity of experiments allows for more precise assessment of value, or equivalently running the experiments on smaller populations (supporting more experiments) or for shorter durations (improving the feedback cycle and agility). We propose an approach (CUPED) that utilizes data from the pre-experiment period to reduce metric variability and hence achieve better sensitivity. This technique is applicable to a wide variety of key business metrics, and it is practical and easy to implement. The results on Bing's experimentation system are very successful: we can reduce variance by about 50%, effectively achieving the same statistical power with only half of the users, or half the duration.

[1]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[2]  G. Keppel,et al.  Introduction to Design and Analysis : A Student's Handbook , 1980 .

[3]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[4]  B. Bowerman Statistical Design and Analysis of Experiments, with Applications to Engineering and Science , 1989 .

[5]  A. Tsiatis,et al.  Efficiency Study of Estimators for a Treatment Effect in a Pretest–Posttest Trial , 2001 .

[6]  M. Davidian,et al.  Semiparametric Estimation of Treatment Effect in a Pretest‐Posttest Study , 2003, Biometrics.

[7]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[8]  M. Davidian,et al.  Semiparametric Estimation of Treatment Effect in a Pretest-Posttest Study with Missing Data. , 2005, Statistical science : a review journal of the Institute of Mathematical Statistics.

[9]  J. S. Hunter,et al.  Statistics for Experimenters: Design, Innovation, and Discovery , 2006 .

[10]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[11]  Joseph G. Pigeon,et al.  Statistics for Experimenters: Design, Innovation and Discovery , 2006, Technometrics.

[12]  M. Davidian,et al.  Covariate adjustment for two‐sample treatment comparisons in randomized clinical trials: A principled yet flexible approach , 2008, Statistics in medicine.

[13]  Ron Kohavi,et al.  Controlled experiments on the web: survey and practical guide , 2009, Data Mining and Knowledge Discovery.

[14]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[15]  Ron Kohavi,et al.  Online Experimentation at Microsoft , 2009 .

[16]  Ashish Agarwal,et al.  Overlapping experiment infrastructure: more, better, faster experimentation , 2010, KDD.

[17]  R. Longbotham,et al.  Choice of the Randomization Unit in Online Controlled Experiment , 2011 .

[18]  Ron Kohavi,et al.  Trustworthy online controlled experiments: five puzzling outcomes explained , 2012, KDD.

[19]  Filip Radlinski,et al.  Large-scale validation and analysis of interleaved search evaluation , 2012, TOIS.