Testing for the appropriate level of clustering in linear regression models

The overwhelming majority of empirical research that uses cluster-robust inference assumes that the clustering structure is known, even though there are often several possible ways in which a dataset could be clustered. We propose two tests for the correct level of clustering in regression models. One test focuses on inference about a single coefficient, and the other on inference about two or more coefficients. We provide both asymptotic and wild bootstrap implementations. The proposed tests work for a null hypothesis of either no clustering or "fine" clustering against alternatives of "coarser" clustering. We also propose a sequential testing procedure to determine the appropriate level of clustering. Simulations suggest that the bootstrap tests perform very well under the null hypothesis and can have excellent power. An empirical example suggests that using the tests leads to sensible inferences.

[1]  F. Eicker Asymptotic Normality and Consistency of the Least Squares Estimators for Families of Linear Regressions , 1963 .

[2]  F. Mosteller The Tennessee study of class size in the early school grades. , 1995, The Future of children.

[3]  Andrew V. Carter,et al.  Asymptotic Behavior of a t-Test Robust to Cluster Heterogeneity , 2017, Review of Economics and Statistics.

[4]  H. White A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity , 1980 .

[5]  J. MacKinnon How Cluster‐Robust Inference is Changing Applied Econometrics , 2019, Canadian Journal of Economics/Revue canadienne d'économique.

[6]  Roger L. Berger,et al.  Testing Hypotheses concerning Unions of Linear Subspaces , 1984 .

[7]  A. Krueger,et al.  Experimental Estimates of Education Production Functions , 1997 .

[8]  Ulrich K. Müller,et al.  t-Statistic Based Correlation and Heterogeneity Robust Inference , 2007 .

[9]  James G. MacKinnon,et al.  Wild Bootstrap Inference for Wildly Different Cluster Sizes , 2017 .

[10]  Brent R. Moulton Random group effects and the precision of regression estimates , 1986 .

[11]  C. de Chaisemartin,et al.  At What Level Should One Cluster Standard Errors in Paired Experiments, and in Stratified Experiments with Small Strata? , 2019, SSRN Electronic Journal.

[12]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[13]  J. Hausman Specification tests in econometrics , 1978 .

[14]  Ulrich K. Müller,et al.  Inference with Few Heterogeneous Clusters , 2016, Review of Economics and Statistics.

[15]  B. M. Pötscher,et al.  MODEL SELECTION AND INFERENCE: FACTS AND FICTION , 2005, Econometric Theory.

[16]  J. MacKinnon,et al.  The power of bootstrap and asymptotic tests , 2006 .

[17]  J. MacKinnon,et al.  Bootstrap tests: how many bootstraps? , 2000 .

[18]  Matthew D. Webb,et al.  The Wild Bootstrap for Few (Treated) Clusters , 2018 .

[19]  James G. MacKinnon,et al.  Thirty Years of Heteroskedasticity-Robust Inference , 2013 .

[20]  Regina Y. Liu Bootstrap Procedures under some Non-I.I.D. Models , 1988 .

[21]  Michal Kolesár,et al.  Robust Standard Errors in Small Samples: Some Practical Advice , 2012, Review of Economics and Statistics.

[22]  Timothy G. Conley,et al.  Inference with Dependent Data in Accounting and Finance Applications , 2017 .

[23]  Matthew D. Webb,et al.  Fast and wild: Bootstrap inference in Stata using boottest , 2018, The Stata Journal: Promoting communications on statistics and Stata.

[24]  B. Hansen,et al.  Asymptotic Theory for Clustered Samples , 2017, Journal of Econometrics.

[25]  Douglas L. Miller,et al.  A Practitioner’s Guide to Cluster-Robust Inference , 2015, The Journal of Human Resources.

[26]  E. Duflo,et al.  How Much Should We Trust Differences-in-Differences Estimates? , 2001 .

[27]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[28]  James G. MacKinnon,et al.  Simulation-Based Tests that Can Use Any Number of Simulations , 2007, Commun. Stat. Simul. Comput..

[29]  C. F. Wu JACKKNIFE , BOOTSTRAP AND OTHER RESAMPLING METHODS IN REGRESSION ANALYSIS ' BY , 2008 .

[30]  Emmanuel Flachaire,et al.  The wild bootstrap, tamed at last , 2001 .

[31]  Susan Athey,et al.  When Should You Adjust Standard Errors for Clustering? , 2017, The Quarterly Journal of Economics.

[32]  J. MacKinnon,et al.  Asymptotic theory and wild bootstrap inference with clustered errors , 2019, Journal of Econometrics.

[33]  J. Finn,et al.  Answers and Questions About Class Size: A Statewide Experiment , 1990 .

[34]  James G. MacKinnon,et al.  A New Form of the Information Matrix Test , 1992 .

[35]  Matthew D. Webb Reworking wild bootstrap‐based inference for clustered errors , 2014, Canadian Journal of Economics/Revue canadienne d'économique.

[36]  Susan Athey,et al.  The Econometrics of Randomized Experiments , 2016, 1607.00698.

[37]  B. M. Brown,et al.  Martingale Central Limit Theorems , 1971 .

[38]  James G. MacKinnon,et al.  When and How to Deal with Clustered Errors in Regression Models , 2020 .