Optimal design of experiments in the presence of network-correlated outcomes

We consider the problem of how to assign treatment in a randomized experiment, when the correlation among the outcomes is informed by a network available pre-intervention. Working within the potential outcome causal framework, we develop a class of models that posit such a correlation structure among the outcomes, and a strategy for allocating treatment optimally, for the goal of minimizing the integrated mean squared error of the estimated average treatment effect. We provide insights into features of the optimal designs via an analytical decomposition of the mean squared error used for optimization. We illustrate how the proposed treatment allocation strategy improves on allocations that ignore the network structure, with extensive simulations.

[1]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.

[2]  Charles F. Manski,et al.  Identification of Treatment Response with Social Interactions , 2013 .

[3]  Arun Sundararajan,et al.  Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks , 2009, Proceedings of the National Academy of Sciences.

[4]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[5]  Donald B. Rubin,et al.  Comment : Neyman ( 1923 ) and Causal Inference in Experiments and Observational Studies , 2007 .

[6]  Dean Eckles,et al.  Design and Analysis of Experiments in Networks: Reducing Bias from Interference , 2014, ArXiv.

[7]  N. Christakis,et al.  The Spread of Obesity in a Large Social Network Over 32 Years , 2007, The New England journal of medicine.

[8]  P. Laycock,et al.  Optimum Experimental Designs , 1995 .

[9]  Tyler J. VanderWeele,et al.  Vaccines, Contagion, and Social Networks , 2014, ArXiv.

[10]  Cameron Marlow,et al.  A 61-million-person experiment in social influence and political mobilization , 2012, Nature.

[11]  R. Fisher On the Interpretation of χ 2 from Contingency Tables , and the Calculation of P Author , 2022 .

[12]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[13]  C. Glymour,et al.  STATISTICS AND CAUSAL INFERENCE , 1985 .

[14]  David A. Harville,et al.  Experimental Randomization: Who Needs It? , 1975 .

[15]  M. Hudgens,et al.  Toward Causal Inference With Interference , 2008, Journal of the American Statistical Association.

[16]  Duncan J. Watts,et al.  Everyone's an influencer: quantifying influence on twitter , 2011, WSDM '11.

[17]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[18]  C. Morris,et al.  Unifying the Named Natural Exponential Families and Their Relatives , 2009 .

[19]  Edward K. Kao,et al.  Estimation of Causal Peer Influence Effects , 2013, ICML.

[20]  D. Rubin Causal Inference Using Potential Outcomes , 2005 .

[21]  Cosma Rohilla Shalizi,et al.  Homophily and Contagion Are Generically Confounded in Observational Social Network Studies , 2010, Sociological methods & research.

[22]  C. Blumberg Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[23]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[24]  P. Aronow,et al.  Estimating Average Causal Effects Under Interference Between Units , 2015 .

[25]  R. Fisher Statistical methods for research workers , 1927, Protoplasma.

[26]  Anmol Bhasin,et al.  Network A/B Testing: From Sampling to Estimation , 2015, WWW.

[27]  Lada A. Adamic,et al.  The role of social networks in information diffusion , 2012, WWW.

[28]  C. Morris Natural Exponential Families with Quadratic Variance Functions , 1982 .

[29]  Donald B. Rubin,et al.  Bayesian Inference for Causal Effects: The Role of Randomization , 1978 .

[30]  K. Chaloner,et al.  Optimal Bayesian design applied to logistic regression experiments , 1989 .

[31]  W. G. Hunter,et al.  The use of prior distributions in the design of experiments for parameter estimation in non-linear situations. , 1967, Biometrika.

[32]  J. Kiefer Optimum Experimental Designs , 1959 .

[33]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[34]  Jon M. Kleinberg,et al.  Graph cluster randomization: network exposure to multiple universes , 2013, KDD.

[35]  D. Rubin [On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9.] Comment: Neyman (1923) and Causal Inference in Experiments and Observational Studies , 1990 .

[36]  H. L. Lucas,et al.  DESIGN OF EXPERIMENTS IN NON-LINEAR SITUATIONS , 1959 .

[37]  P. Rosenbaum Interference Between Units in Randomized Experiments , 2007 .

[38]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[39]  T. Speed,et al.  On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9 , 1990 .

[40]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data: Methods and Models , 2009 .

[41]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[42]  Terence P. Speed,et al.  Introductory Remarks on Neyman (1923) , 1990 .

[43]  R. Fisher On the Interpretation of χ2 from Contingency Tables, and the Calculation of P , 2018, Journal of the Royal Statistical Society Series A (Statistics in Society).

[44]  E. Airoldi,et al.  A natural experiment of social network formation and dynamics , 2015, Proceedings of the National Academy of Sciences.

[45]  Allen S. Mandel Comment … , 1978, British heart journal.