The A/B Testing Problem

"Randomized experiments are increasingly central to innovation in many fields. In the tech sector, major platforms run thousands of experiments (called A/B tests) each year on tens of millions of users at any given time and use the results to screen most product innovations. In the policy and academic circles, governments, nonprofit organizations, and academics use randomized control trials to evaluate social programs and shape public policy. Experiments are not only prevalent, but also highly heterogeneous in design. Policy makers and tech giants typically focus on a "go big" approach, obtaining large sample sizes for a small number of experiments to ensure they that can detect even small benefits of a policy intervention. In contrast, many start-ups and entrepreneurs take a different "go lean" approach, running many small tests and discarding any innovation without outstanding success. The idea is to quickly and cheaply experiment with many ideas, abandon or pivot from ideas that do not work, and scale up ideas that do work. In this paper, we study when each of these approaches is appropriate. To do so, we propose a new framework for optimal experimentation that we call the A/B testing problem. The frameworks also yields an optimal strategy of what innovations to implement and methods to calculate the value of data and experimentation. The key insight is that the optimal experimentation strategy depends crucially on the tails of the distribution of innovation quality, and whether these tails have "black swan" outliers, of innovations with a very large positive or negative impact. The A/B testing problem is as follows. A firm has a set of potential innovations i=1,-,I to implement. The quality ..._i of innovation i is unknown and comes from a distribution G. Quality is independently distributed across innovations. The firm selects a number of users n_i to allocate to an A/B test evaluating innovation i. This yields a signal with mean equal to the true quality of idea i and variance a^2/n_i. The firm is subject to the constraint that the total number of users assigned to experiments is no greater than the number N of users available for experimentation. After seeing the realization of the signals, the firm selects a subset S of ideas to implement. The firm's objective is to maximize the expected sum of the true quality of the ideas that are implemented.