When Should You Adjust Standard Errors for Clustering?

In empirical work in economics it is common to report standard errors that account for clustering of units. Typically, the motivation given for the clustering adjustments is that unobserved components in outcomes for units within clusters are correlated. However, because correlation may occur across more than one dimension, this motivation makes it difficult to justify why researchers use clustering in some dimensions, such as geographic, but not others, such as age cohorts or gender. It also makes it difficult to explain why one should not cluster with data from a randomized experiment. In this paper, we argue that clustering is in essence a design problem, either a sampling design or an experimental design issue. It is a sampling design issue if sampling follows a two stage process where in the first stage, a subset of clusters were sampled randomly from a population of clusters, while in the second stage, units were sampled randomly from the sampled clusters. In this case the clustering adjustment is justified by the fact that there are clusters in the population that we do not see in the sample. Clustering is an experimental design issue if the assignment is correlated within the clusters. We take the view that this second perspective best fits the typical setting in economics where clustering adjustments are used. This perspective allows us to shed new light on three questions: (i) when should one adjust the standard errors for clustering, (ii) when is the conventional adjustment for clustering appropriate, and (iii) when does the conventional adjustment of the standard errors matter.

[1]  Jeffrey M. Wooldridge,et al.  Cluster-Sample Methods in Applied Econometrics , 2003 .

[2]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[3]  Allan Donner,et al.  Design and Analysis of Cluster Randomization Trials in Health Research , 2001 .

[4]  E. Duflo,et al.  How Much Should We Trust Differences-in-Differences Estimates? , 2001 .

[5]  P. Diggle Analysis of Longitudinal Data , 1995 .

[6]  Ulrich K. Müller,et al.  t-Statistic Based Correlation and Heterogeneity Robust Inference , 2007 .

[7]  Brent R. Moulton,et al.  Alternative Tests of the Error Components Model , 1989 .

[8]  Robert Haining,et al.  Statistics for spatial data: by Noel Cressie, 1991, John Wiley & Sons, New York, 900 p., ISBN 0-471-84336-9, US $89.95 , 1993 .

[9]  Brent R. Moulton An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Unit , 1990 .

[10]  F. Eicker Limit Theorems for Regressions with Unequal and Dependent Errors , 1967 .

[11]  Christian Hansen,et al.  Generalized least squares inference in panel and multilevel models with serial correlation and fixed effects , 2007 .

[12]  Chris Roberts,et al.  Design and analysis of clinical trials with clustering effects due to treatment , 2005, Clinical trials.

[13]  D. V. Lindley,et al.  Randomization Analysis of Experimental Data: The Fisher Randomization Test Comment , 1980 .

[14]  Simon G Thompson,et al.  The use of random effects models to allow for clustering in individually randomized trials , 2005, Clinical trials.

[15]  J. Stock,et al.  Heteroskedasticity-Robust Standard Errors for Fixed Effects Panel Data Regression , 2006 .

[16]  J. I The Design of Experiments , 1936, Nature.

[17]  H. White Asymptotic theory for econometricians , 1985 .

[18]  T. Shakespeare,et al.  Observational Studies , 2003 .

[19]  Timothy G. Conley GMM estimation with cross sectional dependence , 1999 .

[20]  Jeffrey M. Wooldridge,et al.  Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data , 2003 .

[21]  David M. Murray,et al.  Design and Analysis of Group- Randomized Trials , 1998 .

[22]  Susan Athey,et al.  The Econometrics of Randomized Experiments , 2016, 1607.00698.

[23]  Douglas L. Miller,et al.  A Practitioner’s Guide to Cluster-Robust Inference , 2015, The Journal of Human Resources.

[24]  Ulrich K. Müller,et al.  Inference with Few Heterogeneous Clusters , 2016, Review of Economics and Statistics.

[25]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[26]  Kosuke Imai,et al.  Survey Sampling , 1998, Nov/Dec 2017.

[27]  Susan Athey,et al.  Sampling‐Based versus Design‐Based Uncertainty in Regression Analysis , 2017, Econometrica.

[28]  M. Arellano,et al.  Computing Robust Standard Errors for Within-Groups Estimators , 2009 .

[29]  Teun Kloek,et al.  OLS Estimation in a Model Where a Microvariable Is Explained by Aggregates and Contemporaneous Disturbances Are Equicorrelated , 1979 .

[30]  Nicole Fassbinder,et al.  Mostly Harmless Econometrics An Empiricists Companion , 2016 .

[31]  Stephen G. Donald,et al.  Inference with Difference-in-Differences and Other Panel Data , 2007, The Review of Economics and Statistics.

[32]  Daniel F. McCaffrey,et al.  Estimating the Standard Error of the Impact Estimator in Individually Randomized Trials With Clustering , 2014 .

[33]  Brent R. Moulton Random group effects and the precision of regression estimates , 1986 .