Controlled experiments on the web: survey and practical guide

The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments, A/B tests (and their generalizations), split tests, Control/Treatment tests, MultiVariable Tests (MVT) and parallel flights. Controlled experiments embody the best scientific design for establishing a causal relationship between changes and their influence on user-observable behavior. We provide a practical guide to conducting online experiments, where end-users can help guide the development of features. Our experience indicates that significant learning and return-on-investment (ROI) are seen when development teams listen to their customers, not to the Highest Paid Person’s Opinion (HiPPO). We provide several examples of controlled experiments with surprising results. We review the important ingredients of running controlled experiments, and discuss their limitations (both technical and organizational). We focus on several areas that are critical to experimentation, including statistical power, sample size, and techniques for variance reduction. We describe common architectures for experimentation systems and analyze their advantages and disadvantages. We evaluate randomization and hashing techniques, which we show are not as simple in practice as is often assumed. Controlled experiments typically generate large amounts of data, which can be analyzed using data mining techniques to gain deeper understanding of the factors influencing the outcome of interest, leading to new hypotheses and creating a virtuous cycle of improvements. Organizations that embrace controlled experiments with clear evaluation criteria can evolve their systems with automated optimizations and real-time analyses. Based on our extensive practical experience with multiple systems and organizations, we share key lessons that will help practitioners in running trustworthy controlled experiments.

[1]  R. Plackett,et al.  THE DESIGN OF OPTIMUM MULTIFACTORIAL EXPERIMENTS , 1946 .

[2]  O. L. Davies,et al.  The construction and uses of fractional factorial designs in industrial research. , 1950, Biometrics.

[3]  R. E. Wheeler The Validity of Portable Power , 1975 .

[4]  Mark W. Lipsey,et al.  Evaluation: A Systematic Approach , 1979 .

[5]  G. Keppel,et al.  Introduction to Design and Analysis : A Student's Handbook , 1980 .

[6]  B. Bowerman Statistical Design and Analysis of Experiments, with Applications to Engineering and Science , 1989 .

[7]  P. Rossi,et al.  Evaluation: A systematic approach, 5th ed. , 1989 .

[8]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[9]  R. Kaplan,et al.  The Balanced Scorecard: Translating Strategy into Action , 1996 .

[10]  Carol H. Weiss Evaluation : methods for studying programs and policies , 1997 .

[11]  H. Marks,et al.  Reviews: Medicine and Health-The Progress of Experiment: Science and Therapeutic Reform in the United States, 1900-1990 , 1997 .

[12]  D. Boos,et al.  How Large Does n Have to be for Z and t Intervals? , 2000 .

[13]  S Thomke,et al.  Enlightened experimentation. The new imperative for innovation. , 2001, Harvard business review.

[14]  G. Belle Statistical rules of thumb , 2002 .

[15]  Richard Craig Van Nostrand,et al.  Design of Experiments Using the Taguchi Approach: 16 Steps to Product and Process Improvement , 2002, Technometrics.

[16]  Ron Kohavi,et al.  Ten Supplementary Analyses to Improve E-commerce Web Sites , 2003 .

[17]  Stefan H. Thomke,et al.  Experimentation Matters: Unlocking the Potential of New Technologies for Innovation , 2003 .

[18]  Daphne Freeder,et al.  Web Metrics: Proven Methods for Measuring Web Site Success , 2003 .

[19]  Rajesh Parekh,et al.  Lessons and Challenges from Mining Retail E-Commerce Data , 2004, Machine Learning.

[20]  Vipin Kumar,et al.  Discovery of Web Robot Sessions Based on their Navigational Patterns , 2004, Data Mining and Knowledge Discovery.

[21]  Eric T. Peterson,et al.  Web Site Measurement Hacks , 2005 .

[22]  Anthony W. Ulwick What Customers Want: Using Outcome-Driven Innovation to Create Breakthrough Products and Services , 2005 .

[23]  Bryan Eisenberg,et al.  Call to Action: Secret Formulas to Improve Online Results , 2005 .

[24]  Jerri L. Ledford,et al.  Google Analytics , 2006 .

[25]  A. Briggs,et al.  Statistical Analysis of Cost-effectiveness Data: Willan/Statistical Analysis of Cost-effectiveness Data , 2006 .

[26]  Joseph G. Pigeon,et al.  Statistics for Experimenters: Design, Innovation and Discovery , 2006, Technometrics.

[27]  Mike Moran Do It Wrong Quickly: How the Web Changes the Old Marketing Rules , 2007 .

[28]  Ron Kohavi,et al.  Practical guide to controlled experiments on the web: listen to your customers not to the hippo , 2007, KDD '07.

[29]  E. Elbasha Statistical Analysis of Cost-Effectiveness Data , 2008 .