A/B testing plays an important role in modern appli cations, particularly on the internet, as it helps businesses to optimize their user experience to max imize usage and profits. In this paper we discuss A/B testing on counts (such as number of se arch s by a user), and point out the importance of using appropriate distributions for statistical analysis. We discuss the use of the negative binomial (NB) distribution to evaluate performance instead of the commonly used Poisson distribution. Our motivating application is in A/B testing on number of searches of users of an internet search engine made by the company SweetIM. We demonstrate the inappropriateness of the standard Poisson assumption for these data, and show that the conclusions from analyses of specific A/A and A/B tests run in this application with NB differ from those with an incorrect Poisson assumption. Using a normal approximation, w e describe a general property of NB tests – the existence of a bound on testing power, which is independent of mean expected usage (or length of time running the test) under typical assumptions . This leads to a disconcerting conclusion that such tests cannot guarantee high statistical power for identifying a small difference between the A and B groups, no matter how long they are run. This is n sharp contrast to "standard" tests with binomial, normal or Poisson assumptions, where any desired power can be attained by running the test long enough, as long as the A and B groups dif fer. We also describe and apply a permutation test as a non-parametric approach for testing. In o ur view, the non-parametric approach is an important complement to parametric tests like NB in ference, because it is valid for testing the most general null hypotheses of equality between A and B istributions, without assuming anything about the form of these distributions.
[1]
David C. Schmittlein,et al.
Technical Note---Why Does the NBD Model Work? Robustness in Representing Product Purchases, Brand Purchases and Imperfectly Recorded Purchases
,
1985
.
[2]
P. Thall,et al.
Some covariance models for longitudinal count data with overdispersion.
,
1990,
Biometrics.
[3]
Robert Tibshirani,et al.
An Introduction to the Bootstrap
,
1994
.
[4]
M. Nei,et al.
Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees.
,
1993,
Molecular biology and evolution.
[5]
M. Kimura.
Population Genetics, Molecular Evolution, and the Neutral Theory: Selected Papers
,
1995
.
[6]
Ron Kohavi,et al.
Practical guide to controlled experiments on the web: listen to your customers not to the hippo
,
2007,
KDD '07.
[7]
R. Berk,et al.
Overdispersion and Poisson Regression
,
2008
.
[8]
P. McCullagh.
Estimating the Number of Unseen Species: How Many Words did Shakespeare Know?
,
2008
.
[9]
Inmaculada B. Aban,et al.
Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data
,
2009,
Comput. Stat. Data Anal..
[10]
Ashish Agarwal,et al.
Overlapping experiment infrastructure: more, better, faster experimentation
,
2010,
KDD.
[11]
James A Bonneson,et al.
Crash Experience Warrant for Traffic Signals
,
2014
.