Variance Reduction in Bipartite Experiments through Correlation Clustering

Causal inference in randomized experiments typically assumes that the units of randomization and the units of analysis are one and the same. In some applications, however, these two roles are played by distinct entities linked by a bipartite graph. The key challenge in such bipartite settings is how to avoid interference bias, which would typically arise if we simply randomized the treatment at the level of analysis units. One effective way of minimizing interference bias in standard experiments is through cluster randomization, but this design has not been studied in the bipartite setting where conventional clustering schemes can lead to poorly powered experiments. This paper introduces a novel clustering objective and a corresponding algorithm that partitions a bipartite graph so as to maximize the statistical power of a bipartite experiment on that graph. Whereas previous work relied on balanced partitioning, our formulation suggests the use of a correlation clustering objective. We use a publicly-available graph of Amazon user-item reviews to validate our solution and illustrate how it substantially increases the statistical power in bipartite experiments.

[1]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[2]  Julian J. McAuley,et al.  Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering , 2016, WWW.

[3]  Andrew V. Goldberg,et al.  Exact Combinatorial Branch-and-Bound for Graph Bisection , 2012, ALENEX.

[4]  Thomas Nedelec,et al.  Offline A/B Testing for Recommender Systems , 2018, WSDM.

[5]  Holly B. Shakya,et al.  Exploiting social influence to magnify population-level behaviour change in maternal and child health: study protocol for a randomised controlled trial of network targeting algorithms in rural Honduras , 2017, BMJ Open.

[6]  Zhong Zhao,et al.  Evaluating continuous training programmes by using the generalized propensity score , 2007 .

[7]  Charles F. Manski,et al.  Identification of Treatment Response with Social Interactions , 2013 .

[8]  Tyler J VanderWeele,et al.  On causal inference in the presence of interference , 2012, Statistical methods in medical research.

[9]  Shuchi Chawla,et al.  A/B Testing of Auctions , 2016, EC.

[10]  M. Elsner,et al.  Bounding and Comparing Methods for Correlation Clustering Beyond ILP , 2009, ILP 2009.

[11]  Kosuke Imai,et al.  Causal Inference With General Treatment Regimes , 2004 .

[12]  Dean Eckles,et al.  Design and Analysis of Experiments in Networks: Reducing Bias from Interference , 2014, ArXiv.

[13]  S. Raudenbush Statistical analysis and optimal design for cluster randomized trials , 1997 .

[14]  Keying Ye,et al.  Applied Bayesian Modeling and Causal Inference From Incomplete-Data Perspectives , 2005, Technometrics.

[15]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[16]  G. Imbens,et al.  The Propensity Score with Continuous Treatments , 2005 .

[17]  D. Rubin Causal Inference Using Potential Outcomes , 2005 .

[18]  Edoardo M. Airoldi,et al.  Optimizing Cluster-based Randomized Experiments under Monotonicity , 2018, KDD.

[19]  D'Arcy Mays,et al.  D-Optimal Design for Network A/B Testing , 2019, Journal of Statistical Theory and Practice.

[20]  Corwin M Zigler,et al.  Bipartite Causal Inference with Interference. , 2018, Statistical science : a review journal of the Institute of Mathematical Statistics.

[21]  Sergey Ioffe,et al.  Improved Consistent Sampling, Weighted Minhash and L1 Sketching , 2010, 2010 IEEE International Conference on Data Mining.

[22]  Amos Fiat,et al.  Correlation clustering in general weighted graphs , 2006, Theor. Comput. Sci..

[23]  David Holtz,et al.  The Determinants of Online Review Informativeness: Evidence from Field Experiments on Airbnb , 2018 .

[24]  Lars Backstrom,et al.  Balanced label propagation for partitioning massive graphs , 2013, WSDM.

[25]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[26]  Gabriel Kliot,et al.  Streaming graph partitioning for large distributed graphs , 2012, KDD.

[27]  M. Hudgens,et al.  Toward Causal Inference With Interference , 2008, Journal of the American Statistical Association.

[28]  Edoardo M. Airoldi,et al.  Detecting Network Effects: Randomizing Over Randomized Experiments , 2017, KDD.

[29]  Vahab S. Mirrokni,et al.  Distributed Balanced Partitioning via Linear Embedding , 2015, WSDM.

[30]  Charalampos E. Tsourakakis,et al.  FENNEL: streaming graph partitioning for massive scale graphs , 2014, WSDM.

[31]  E. Moodie,et al.  Estimation of dose–response functions for longitudinal data using the generalised propensity score , 2012, Statistical methods in medical research.

[32]  Michael S. Bernstein,et al.  Designing and deploying online field experiments , 2014, WWW.

[33]  Nir Ailon,et al.  Aggregating inconsistent information: Ranking and clustering , 2008 .

[34]  Venkatesan Guruswami,et al.  Clustering with qualitative information , 2005, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[35]  Jon M. Kleinberg,et al.  Graph cluster randomization: network exposure to multiple universes , 2013, KDD.

[36]  Douglas Galagate,et al.  Causal inference with a continuous treatment and outcome: Alternative estimators for parametric dose-response functions with applications , 2016 .

[37]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[38]  Edward K. Kao,et al.  Estimation of Causal Peer Influence Effects , 2013, ICML.

[39]  Anmol Bhasin,et al.  Network A/B Testing: From Sampling to Estimation , 2015, WWW.

[40]  Avi Feller,et al.  Analyzing Two-Stage Experiments in the Presence of Interference , 2016, 1608.06805.

[41]  Stephen W. Raudenbush,et al.  Effects of Kindergarten Retention Policy on Children’s Cognitive Growth in Reading and Mathematics , 2005 .