Design and Analysis of Cluster-Randomized Field Experiments in Panel Data Settings

Field experiments conducted with the village, city, state, region, or even country as the unit of randomization are becoming commonplace in the social sciences. While convenient, subsequent data analysis may be complicated by the constraint on the number of clusters in treatment and control. Through a battery of Monte Carlo simulations, we examine best practices for estimating unit-level treatment effects in cluster-randomized field experiments, particularly in settings that generate short panel data. In most settings we consider, unit-level estimation with unit fixed effects and cluster-level estimation weighted by the number of units per cluster tend to be robust to potentially problematic features in the data while giving greater statistical power. Using insights from our analysis, we evaluate the effect of a unique field experiment: a nationwide tipping field experiment across markets on the Uber app. Beyond the import of showing how tipping affects aggregate market outcomes, we provide several insights on aspects of generating and analyzing cluster-randomized experimental data when there are constraints on the number of experimental units in treatment and control.

[1]  Susan Athey,et al.  The Econometrics of Randomized Experiments , 2016, 1607.00698.

[2]  J. List,et al.  The Gender Earnings Gap in the Gig Economy: Evidence from Over a Million Rideshare Drivers , 2018, The Review of Economic Studies.

[3]  Michael Lynn,et al.  Why are we more likely to tip some service occupations than others? Theory, evidence, and implications , 2016 .

[4]  Donald Hedeker,et al.  Longitudinal Data Analysis , 2006 .

[5]  Ofer H. Azar The Social Norm of Tipping: A Review , 2002 .

[6]  Christian Hansen,et al.  Generalized least squares inference in panel and multilevel models with serial correlation and fixed effects , 2007 .

[7]  Douglas L. Miller,et al.  A Practitioner’s Guide to Cluster-Robust Inference , 2015, The Journal of Human Resources.

[8]  E. Duflo,et al.  How Much Should We Trust Differences-in-Differences Estimates? , 2001 .

[9]  Uri Gneezy,et al.  The Drivers of Social Preferences: Evidence from a Nationwide Tipping Field Experiment , 2019, SSRN Electronic Journal.

[10]  Susan Athey,et al.  Finite Population Causal Standard Errors , 2014 .

[11]  Susan Athey,et al.  When Should You Adjust Standard Errors for Clustering? , 2017, The Quarterly Journal of Economics.

[12]  Brent R. Moulton An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Unit , 1990 .

[13]  Michael Lynn,et al.  The norm of restaurant tipping , 2003 .

[14]  Denis Chetverikov,et al.  IV Quantile Regression for Group-Level Treatments, with an Application to the Distributional Effects of Trade , 2015 .

[15]  H. White Asymptotic theory for econometricians , 1985 .