Online controlled experiments, now commonly known as A/B testing, are crucial to causal inference and data driven decision making in many internet based businesses. While a simple comparison between a treatment (the feature under test) and a control (often the current standard), provides a starting point to identify the cause of change in Key Performance Indicator (KPI), it is often insucient, as the change we wish to detect may be small, and inherent variation contained in data may obscure movements in KPI. To have sucient power to detect statistically signicant changes in KPI, an experiment needs to engage a suciently large proportion of trac to the site, and also last for a suciently long duration. This limits the number of candidate variations to be evaluated, and the speed new feature iterations. We introduce more sophisticated experimental designs, specically the repeated measures design, including the crossover design and related variants, to increase KPI sensitivity with the same trac size and duration of experiment. In this paper we present FORME (Flexible Online Repeated Measures Experiment), a exible and scalable framework for these designs. We evaluate the theoretic basis, design considerations, practical guidelines and big data implementation. We compare FORME to an existing methodology called mixed eect model and demonstrate why FORME is more exible and scalable. We present empirical results based on both simulation and real data. Our method is widely applicable to online experimentation to improve sensitivity in detecting movements in KPI, and increase experimentation capability.
[1]
Li Ma,et al.
A Four Group Cross-Over Design for Measuring Irreversible Treatments on Web Search Tasks
,
2011,
2011 44th Hawaii International Conference on System Sciences.
[2]
D. Bates,et al.
Fitting Linear Mixed-Effects Models Using lme4
,
2014,
1406.5823.
[3]
D. Rubin,et al.
The central role of the propensity score in observational studies for causal effects
,
1983
.
[4]
Yu Guo,et al.
Statistical inference in two-stage online controlled experiments with treatment selection and validation
,
2014,
WWW.
[5]
Jean-René Barra.
Testing Statistical Hypotheses
,
1981
.
[6]
Ron Kohavi,et al.
Seven rules of thumb for web site experimenters
,
2014,
KDD.
[7]
Ron Kohavi,et al.
Trustworthy online controlled experiments: five puzzling outcomes explained
,
2012,
KDD.
[8]
Christopher Winship,et al.
Counterfactuals and Causal Inference: Methods and Principles for Social Research
,
2007
.
[9]
E. Lehmann.
Testing Statistical Hypotheses
,
1960
.
[10]
Jeffrey M. Wooldridge,et al.
Introductory Econometrics: A Modern Approach
,
1999
.
[11]
David Mease,et al.
Evaluating web search using task completion time
,
2009,
SIGIR.
[12]
Ron Kohavi,et al.
Improving the sensitivity of online controlled experiments by utilizing pre-experiment data
,
2013,
WSDM.