Test & Roll: Profit-Maximizing A/B Tests

Marketers often use A/B testing as a tool to compare marketing treatments in a test stage and then deploy the better-performing treatment to the remainder of the consumer population. While these tests have traditionally been analyzed using hypothesis testing, we re-frame them as an explicit trade-off between the opportunity cost of the test (where some customers receive a sub-optimal treatment) and the potential losses associated with deploying a sub-optimal treatment to the remainder of the population. We derive a closed-form expression for the profit-maximizing test size and show that it is substantially smaller than typically recommended for a hypothesis test, particularly when the response is noisy or when the total population is small. The common practice of using small holdout groups can be rationalized by asymmetric priors. The proposed test design achieves nearly the same expected regret as the flexible, yet harder-to-implement multi-armed bandit under a wide range of conditions. We demonstrate the benefits of the method in three different marketing contexts -- website design, display advertising and catalog tests -- in which we estimate priors from past data. In all three cases, the optimal sample sizes are substantially smaller than for a traditional hypothesis test, resulting in higher profit.

[2]  Xavier Drèze,et al.  Real-Time Evaluation of E-mail Campaign Performance , 2009, Mark. Sci..

[3]  S. Zoumpoulis,et al.  Evaluating and Improving Targeting Policies with Field Experiments Using Counterfactual Policy Logging , 2018 .

[4]  L. Pekelis,et al.  The New Stats Engine , 2015 .

[5]  Steven L. Scott,et al.  A modern Bayesian look at the multi-armed bandit , 2010 .

[6]  Joel Huber,et al.  Improving Parameter Estimates and Model Prediction by Aggregate Customization in Choice Experiments , 2001 .

[7]  M. Degroot Optimal Statistical Decisions , 1970 .

[8]  Justin M. Rao,et al.  A/B Testing with Fat Tails , 2019, Journal of Political Economy.

[9]  Dimitris Bertsimas,et al.  A Learning Approach for Interactive Marketing to a Customer Segment , 2007, Oper. Res..

[10]  L. Pekelis,et al.  p-Hacking and False Discovery in A/B Testing , 2018 .

[11]  Eric M. Schwartz,et al.  Dynamic Online Pricing with Incomplete Information Using Multi-Armed Bandit Experiments , 2018, Mark. Sci..

[12]  Eric T. Bradlow,et al.  Measuring Multi-Channel Advertising Response , 2017 .

[13]  Miklos Sarvary,et al.  Which Products Are Best Suited to Mobile Advertising? A Field Study of Mobile Display Advertising Effects on Consumer Attitudes and Intentions , 2014 .

[14]  Sanjog Misra,et al.  Heterogeneous Treatment Effects and Optimal Targeting Policy Evaluation , 2018 .

[15]  Eric T. Bradlow,et al.  Measuring Multichannel Advertising Response , 2017, Manag. Sci..

[16]  Peter S. Fader,et al.  Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments , 2016, Mark. Sci..

[17]  Peter I. Frazier,et al.  Sequential Sampling with Economics of Selection Procedures , 2012, Manag. Sci..

[18]  Susana V. Mondschein,et al.  Mailing Decisions in the Catalog Sales Industry , 1996 .

[19]  W. Luh,et al.  Approximate sample size formulas for the two-sample trimmed mean test with unequal variances. , 2007, The British journal of mathematical and statistical psychology.

[20]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[21]  D. Berry,et al.  Choosing sample size for a clinical trial using decision analysis , 2003 .

[22]  Justin M. Rao,et al.  The Unfavorable Economics of Measuring the Returns to Advertising , 2014 .

[23]  D A Berry,et al.  Decision making during a phase III randomized controlled trial. , 1994, Controlled clinical trials.

[24]  Shipra Agrawal,et al.  Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.

[25]  Stephen E. Chick,et al.  New Two-Stage and Sequential Procedures for Selecting the Best Simulated System , 2001, Oper. Res..

[26]  Martin Posch,et al.  Determination of the optimal sample size for a clinical trial accounting for the population size , 2016, Biometrical journal. Biometrische Zeitschrift.