论文信息 - Online Controlled Experiments and A / B Tests

Online Controlled Experiments and A / B Tests

The internet connectivity of client software (e.g., apps running on phones and PCs), web sites, and online services provide an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called A/B tests, split tests, randomized experiments, control/treatment tests, and online field experiments. Unlike most data mining techniques for finding correlational patterns, controlled experiments allow establishing a causal relationship with high probability. Experimenters can utilize the Scientific Method to form a hypothesis of the form “If a specific change is introduced, will it improve key metrics?” and evaluate it with real users. The theory of a controlled experiment dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, and the topic of offline experiments is well developed in Statistics (Box 2005). Online Controlled Experiments started to be used in the late 1990s with the growth of the Internet. Today, many large sites, including Amazon, Bing, Facebook, Google, LinkedIn, and Yahoo! run thousands to tens of thousands of experiments each year testing user interface (UI) changes, enhancements to algorithms (search, ads, personalization, recommendation, etc.), changes to apps, content management system, etc. Online controlled experiments are now considered an indispensable tool, and their use is growing for startups and smaller websites. Controlled experiments are especially useful in combination with Agile software development (Martin 2008, Rubin 2012), Steve Blank’s Customer Development process (Blank 2005), and MVPs (Minimum Viable Products) popularized by Eric Ries’s Lean Startup (Ries 2011). Motivation and Background Many good resources are available with motivation and explanations about online controlled experiments (Siroker and Koomen 2013, Goward 2012, McFarland 2012, Schrage 2014, Kohavi, Longbotham and Sommerfield, et al. 2009, Kohavi, Deng and Longbotham, et al. 2014, Kohavi, Deng and Frasca, et al. 2013).

Ron Kohavi | R. Longbotham

[1] E. C. Fieller. SOME PROBLEMS IN INTERVAL ESTIMATION , 1954 .

[2] P. Bickel,et al. An Analysis of Transformations Revisited , 1981 .

[3] P. Good. Permutation, Parametric, and Bootstrap Tests of Hypotheses , 2005 .

[4] Y. Benjamini,et al. Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5] Joseph G. Pigeon,et al. Statistics for Experimenters: Design, Innovation and Discovery , 2006, Technometrics.

[6] Mike Moran. Do It Wrong Quickly: How the Web Changes the Old Marketing Rules , 2007 .

[7] M. Kenward,et al. An Introduction to the Bootstrap , 2007 .

[8] Ron Kohavi,et al. Responsible editor: R. Bayardo. , 2022 .

[9] Robert C. Martin. Clean Code - a Handbook of Agile Software Craftsmanship , 2008 .

[10] R. Porcher,et al. P Value and the Theory of Hypothesis Testing: An Explanation for New Researchers , 2010, Clinical orthopaedics and related research.

[11] Ashish Agarwal,et al. Overlapping experiment infrastructure: more, better, faster experimentation , 2010, KDD.