Online Controlled Experiments and A / B Tests

The internet connectivity of client software (e.g., apps running on phones and PCs), web sites, and online services provide an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called A/B tests, split tests, randomized experiments, control/treatment tests, and online field experiments. Unlike most data mining techniques for finding correlational patterns, controlled experiments allow establishing a causal relationship with high probability. Experimenters can utilize the Scientific Method to form a hypothesis of the form “If a specific change is introduced, will it improve key metrics?” and evaluate it with real users. The theory of a controlled experiment dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, and the topic of offline experiments is well developed in Statistics (Box 2005). Online Controlled Experiments started to be used in the late 1990s with the growth of the Internet. Today, many large sites, including Amazon, Bing, Facebook, Google, LinkedIn, and Yahoo! run thousands to tens of thousands of experiments each year testing user interface (UI) changes, enhancements to algorithms (search, ads, personalization, recommendation, etc.), changes to apps, content management system, etc. Online controlled experiments are now considered an indispensable tool, and their use is growing for startups and smaller websites. Controlled experiments are especially useful in combination with Agile software development (Martin 2008, Rubin 2012), Steve Blank’s Customer Development process (Blank 2005), and MVPs (Minimum Viable Products) popularized by Eric Ries’s Lean Startup (Ries 2011). Motivation and Background Many good resources are available with motivation and explanations about online controlled experiments (Siroker and Koomen 2013, Goward 2012, McFarland 2012, Schrage 2014, Kohavi, Longbotham and Sommerfield, et al. 2009, Kohavi, Deng and Longbotham, et al. 2014, Kohavi, Deng and Frasca, et al. 2013).

[1]  E. C. Fieller SOME PROBLEMS IN INTERVAL ESTIMATION , 1954 .

[2]  P. Bickel,et al.  An Analysis of Transformations Revisited , 1981 .

[3]  P. Good Permutation, Parametric, and Bootstrap Tests of Hypotheses , 2005 .

[4]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5]  Joseph G. Pigeon,et al.  Statistics for Experimenters: Design, Innovation and Discovery , 2006, Technometrics.

[6]  Mike Moran Do It Wrong Quickly: How the Web Changes the Old Marketing Rules , 2007 .

[7]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[8]  Ron Kohavi,et al.  Responsible editor: R. Bayardo. , 2022 .

[9]  Robert C. Martin Clean Code - a Handbook of Agile Software Craftsmanship , 2008 .

[10]  R. Porcher,et al.  P Value and the Theory of Hypothesis Testing: An Explanation for New Researchers , 2010, Clinical orthopaedics and related research.

[11]  Ashish Agarwal,et al.  Overlapping experiment infrastructure: more, better, faster experimentation , 2010, KDD.

[12]  Ron Kohavi,et al.  Online Experiments: Practical Lessons , 2010, Computer.

[13]  R. Longbotham,et al.  Choice of the Randomization Unit in Online Controlled Experiment , 2011 .

[14]  Ron Kohavi,et al.  Unexpected results in online controlled experiments , 2011, SKDD.

[15]  Ron Kohavi,et al.  Trustworthy online controlled experiments: five puzzling outcomes explained , 2012, KDD.

[16]  Kenneth S. Rubin,et al.  Essential Scrum: A Practical Guide to the Most Popular Agile Process , 2012 .

[17]  Steve Blank The Four Steps to the Epiphany: Successful Strategies for Products that Win , 2013 .

[18]  Jon M. Kleinberg,et al.  Graph cluster randomization: network exposure to multiple universes , 2013, KDD.

[19]  Ron Kohavi,et al.  Improving the sensitivity of online controlled experiments by utilizing pre-experiment data , 2013, WSDM.

[20]  James V. Stone Bayes' Rule: A Tutorial Introduction to Bayesian Analysis , 2013 .

[21]  Dan Siroker,et al.  A/B Testing: The Most Powerful Way to Turn Clicks Into Customers , 2013 .

[22]  Ron Kohavi,et al.  Online controlled experiments at large scale , 2013, KDD.

[23]  Ron Kohavi,et al.  Seven rules of thumb for web site experimenters , 2014, KDD.

[24]  The Innovator's Hypothesis: How Cheap Experiments Are Worth More than Good Ideas , 2014 .

[25]  Robert L. Wolpert,et al.  Statistical Inference , 2019, Encyclopedia of Social Network Analysis and Mining.

[26]  Alex Deng,et al.  Diluted Treatment Effect Estimation for Trigger Analysis in Online Controlled Experiments , 2015, WSDM.

[27]  The Innovator’s Hypothesis: How Cheap Experiments Are Worth More Than Good Ideas by Michael Schrage , 2015 .

[28]  Brijesh Singh,et al.  The Lean Startup:How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses , 2016 .