Flexible Online Repeated Measures Experiment

Online controlled experiments, now commonly known as A/B testing, are crucial to causal inference and data driven decision making in many internet based businesses. While a simple comparison between a treatment (the feature under test) and a control (often the current standard), provides a starting point to identify the cause of change in Key Performance Indicator (KPI), it is often insucient, as the change we wish to detect may be small, and inherent variation contained in data may obscure movements in KPI. To have sucient power to detect statistically signicant changes in KPI, an experiment needs to engage a suciently large proportion of trac to the site, and also last for a suciently long duration. This limits the number of candidate variations to be evaluated, and the speed new feature iterations. We introduce more sophisticated experimental designs, specically the repeated measures design, including the crossover design and related variants, to increase KPI sensitivity with the same trac size and duration of experiment. In this paper we present FORME (Flexible Online Repeated Measures Experiment), a exible and scalable framework for these designs. We evaluate the theoretic basis, design considerations, practical guidelines and big data implementation. We compare FORME to an existing methodology called mixed eect model and demonstrate why FORME is more exible and scalable. We present empirical results based on both simulation and real data. Our method is widely applicable to online experimentation to improve sensitivity in detecting movements in KPI, and increase experimentation capability.