Bucket testing, also known as A/B testing, is a practice that is widely used by on-line sites with large audiences: in a simple version of the methodology, one evaluates a new feature on the site by exposing it to a very small fraction of the total user population and measuring its effect on this exposed group. For traditional uses of this technique, uniform independent sampling of the population is often enough to produce an exposed group that can serve as a statistical proxy for the full population.
In on-line social network applications, however, one often wishes to perform a more complex test: evaluating a new social feature that will only produce an effect if a user and some number of his or her friends are exposed to it. In this case, independent uniform draws from the population will be unlikely to produce groups that contains users together with their friends, and so the construction of the sample must take the network structure into account. This leads quickly to challenging combinatorial problems, since there is an inherent tension between producing enough correlation to select users and their friends, but also enough uniformity and independence that the selected group is a reasonable sample of the full population.
Here we develop an algorithmic framework for bucket testing in a network that addresses these challenges. First we describe a novel walk-based sampling method for producing samples of nodes that are internally well-connected but also approximately uniform over the population. Then we show how a collection of multiple independent subgraphs constructed this way can yield reasonable samples for testing. We demonstrate the effectiveness of our algorithms through computational experiments on large portions of the Facebook network.
[1]
Minas Gjoka,et al.
Walking in Facebook: A Case Study of Unbiased Sampling of OSNs
,
2010,
2010 Proceedings IEEE INFOCOM.
[2]
Jure Leskovec,et al.
The dynamics of viral marketing
,
2005,
EC '06.
[3]
Jon M. Kleinberg,et al.
Group formation in large social networks: membership, growth, and evolution
,
2006,
KDD '06.
[4]
Stephen P. Boyd,et al.
Fastest Mixing Markov Chain on a Graph
,
2004,
SIAM Rev..
[5]
Richard Sinkhorn.
A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices
,
1964
.
[6]
Alex Samorodnitsky,et al.
A Deterministic Strongly Polynomial Algorithm for Matrix Scaling and Approximate Permanents
,
1998,
STOC '98.
[7]
L. Asz.
Random Walks on Graphs: a Survey
,
2022
.
[8]
Marc Najork,et al.
On near-uniform URL sampling
,
2000,
Comput. Networks.
[9]
Steve Chien,et al.
Approximating Aggregate Queries about Web Pages via Random Walks
,
2000,
VLDB.