Randomized graph cluster randomization

The global average treatment effect (GATE) is a primary quantity of interest in the study of causal inference under network interference. With a correctly specified exposure model of the interference, the Horvitz-Thompson (HT) and Hajek estimators of the GATE are unbiased and consistent, respectively, yet known to exhibit extreme variance under many designs and in many settings of interest. With a fixed clustering of the interference graph, graph cluster randomization (GCR) designs have been shown to greatly reduce variance compared to node-level random assignment, but even so the variance is still often prohibitively large. In this work we propose a randomized version of the GCR design, descriptively named randomized graph cluster randomization (RGCR), which uses a random clustering rather than a single fixed clustering. By considering an ensemble of many different cluster assignments, this design avoids a key problem with GCR where a given node is sometimes "lucky" or "unlucky" in a given clustering. We propose two randomized graph decomposition algorithms for use with RGCR, randomized 3-net and 1-hop-max, adapted from prior work on multiway graph cut problems. When integrating over their own randomness, these algorithms furnish network exposure probabilities that can be estimated efficiently. We develop upper bounds on the variance of the HT estimator of the GATE under assumptions on the metric structure of the interference graph. Where the best known variance upper bound for the HT estimator under a GCR design is exponential in the parameters of the metric structure, we give a comparable variance upper bound under RGCR that is instead polynomial in the same parameters. We provide extensive simulations comparing RGCR and GCR designs, observing substantial reductions in the mean squared error for both HT and Hajek estimators of the GATE in a variety of settings.

[1]  Piotr Sapiezynski,et al.  Quantifying Surveillance in the Networked Age: Node-based Intrusions and Group Privacy , 2018, ArXiv.

[2]  N Linial,et al.  Low diameter graph decompositions , 1993, Comb..

[3]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[4]  Stanley Milgram,et al.  An Experimental Study of the Small World Problem , 1969 .

[5]  Tyler J. VanderWeele,et al.  Concerning the consistency assumption in causal inference. , 2009, Epidemiology.

[6]  Jean Pouget-Abadie,et al.  Testing for arbitrary interference on experimentation platforms , 2017, Biometrika.

[7]  Aaron Clauset,et al.  Assembling thefacebook: Using Heterogeneity to Understand Online Social Network Assembly , 2015, WebSci.

[8]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[9]  Stephen E. Fienberg,et al.  A Brief History of Statistical Models for Network Analysis and Open Challenges , 2012 .

[10]  D. Sussman,et al.  Elements of estimation theory for causal effects in the presence of network interference , 2017, 1702.03578.

[11]  J. Cheeger A lower bound for the smallest eigenvalue of the Laplacian , 1969 .

[12]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[13]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[15]  Lars Backstrom,et al.  Balanced label propagation for partitioning massive graphs , 2013, WSDM.

[16]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[17]  Mihalis Yannakakis,et al.  The complexity of multiway cuts (extended abstract) , 1992, STOC '92.

[18]  Eiji Miyano,et al.  Distance-d independent set problems for bipartite and chordal graphs , 2012, J. Comb. Optim..

[19]  Dean Eckles,et al.  Design and Analysis of Experiments in Networks: Reducing Bias from Interference , 2014, ArXiv.

[20]  Gary L. Miller,et al.  Parallel graph decompositions using random shifts , 2013, SPAA.

[21]  Robert Krauthgamer,et al.  Bounded geometries, fractals, and low-distortion embeddings , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[22]  David F. Gleich,et al.  Vertex neighborhoods, low conductance cuts, and good seeds for local community methods , 2012, KDD.

[23]  M. Hudgens,et al.  Toward Causal Inference With Interference , 2008, Journal of the American Statistical Association.

[24]  Noga Alon,et al.  A Fast and Simple Randomized Parallel Algorithm for the Maximal Independent Set Problem , 1985, J. Algorithms.

[25]  I NICOLETTI,et al.  The Planning of Experiments , 1936, Rivista di clinica pediatrica.

[26]  David R. Karger,et al.  Finding nearest neighbors in growth-restricted metrics , 2002, STOC '02.

[27]  Marco Rosa,et al.  Four degrees of separation , 2011, WebSci '12.

[28]  Jure Leskovec,et al.  The Local Closure Coefficient: A New Perspective On Network Clustering , 2019, WSDM.

[29]  Luca Trevisan,et al.  Multi-way spectral partitioning and higher-order cheeger inequalities , 2011, STOC '12.

[30]  Guy E. Blelloch,et al.  Programming parallel algorithms , 1996, CACM.

[31]  Edoardo M. Airoldi,et al.  Optimizing cluster-based randomized experiments under a monotonicity assumption , 2018, 1803.02876.

[32]  Joel Nishimura,et al.  Restreaming graph partitioning: simple versatile algorithms for advanced balancing , 2013, KDD.

[33]  Ramesh Johari,et al.  Experimental Design in Two-Sided Platforms: An Analysis of Bias , 2020, EC.

[34]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[35]  Alex Chin,et al.  Regression Adjustments for Estimating the Global Treatment Effect in Experiments with Interference , 2018, Journal of Causal Inference.

[36]  Alex J. Chin,et al.  Central limit theorems via Stein's method for randomized experiments under interference , 2018, 1804.03105.

[37]  Michael P. Leung Causal Inference Under Approximate Neighborhood Interference , 2019, SSRN Electronic Journal.

[38]  Carl-Erik Särndal,et al.  Model Assisted Survey Sampling , 1997 .

[39]  Thorsten Joachims,et al.  The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.

[40]  Edoardo M. Airoldi,et al.  Model-assisted design of experiments in the presence of network-correlated outcomes , 2015, Biometrika.

[41]  Lars Backstrom,et al.  The Anatomy of the Facebook Social Graph , 2011, ArXiv.

[42]  Yuval Rabani,et al.  Approximation algorithms for the 0-extension problem , 2001, SODA '01.

[43]  Satish Rao,et al.  A tight bound on approximating arbitrary metrics by tree metrics , 2003, STOC '03.

[44]  Mason A. Porter,et al.  Comparing Community Structure to Characteristics in Online Collegiate Social Networks , 2008, SIAM Rev..

[45]  Maneesh Varshney,et al.  Using Ego-Clusters to Measure Network Effects at LinkedIn , 2019, ArXiv.

[46]  Kyungchul Song,et al.  Limit theorems for network dependent random variables , 2019, Journal of Econometrics.

[47]  Jure Leskovec,et al.  Planetary-scale views on a large instant-messaging network , 2008, WWW.

[48]  Ravi Jagadeesan,et al.  Designs for estimating the treatment effect in networks with interference , 2017, The Annals of Statistics.

[49]  Edoardo M. Airoldi,et al.  Identification and Estimation of Treatment and Interference Effects in Observational Studies on Networks , 2016, Journal of the American Statistical Association.

[50]  Mason A. Porter,et al.  Social Structure of Facebook Networks , 2011, ArXiv.

[51]  Eiji Miyano,et al.  Distance-d independent set problems for bipartite and chordal graphs , 2014, J. Comb. Optim..

[52]  Fredrik Sävje,et al.  AVERAGE TREATMENT EFFECTS IN THE PRESENCE OF UNKNOWN INTERFERENCE. , 2017, Annals of statistics.

[53]  Charles F. Manski,et al.  Identification of Treatment Response with Social Interactions , 2013 .

[54]  David S. Choi,et al.  Estimation of Monotone Treatment Effects in Network Experiments , 2014, ArXiv.

[55]  Viet Ha-Thuc,et al.  A Counterfactual Framework for Seller-Side A/B Testing on Marketplaces , 2020, SIGIR.

[56]  Sharad Goel,et al.  The Effect of Recommendations on Network Structure , 2016, WWW.

[57]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[58]  Michael E. Sobel,et al.  What Do Randomized Studies of Housing Mobility Demonstrate? , 2006 .

[59]  Shang-Hua Teng,et al.  Spectral partitioning works: planar graphs and finite element meshes , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[60]  Peter M. Aronow,et al.  Estimating Average Causal Effects Under Interference Between Units , 2013, 1305.6156.

[61]  Edoardo M. Airoldi,et al.  Detecting Network Effects: Randomizing Over Randomized Experiments , 2017, KDD.

[62]  Guy E. Blelloch,et al.  Greedy sequential maximal independent set and matching are parallel on average , 2012, SPAA '12.

[63]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[64]  Jon M. Kleinberg,et al.  Graph cluster randomization: network exposure to multiple universes , 2013, KDD.

[65]  Alexander V. Karzanov,et al.  Minimum 0-Extensions of Graph Metrics , 1998, Eur. J. Comb..

[66]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[67]  Cosma Rohilla Shalizi,et al.  Homophily and Contagion Are Generically Confounded in Observational Social Network Studies , 2010, Sociological methods & research.