Robust Post-Matching Inference

Nearest-neighbor matching is a popular nonparametric tool to create balance between treatment and control groups in observational studies. As a preprocessing step before regression, matching reduces the dependence on parametric modeling assumptions. In current empirical practice, however, the matching step is often ignored in the calculation of standard errors and confidence intervals. In this article, we show that ignoring the matching step results in asymptotically valid standard errors if matching is done without replacement and the regression model is correctly specified relative to the population regression function of the outcome variable on the treatment variable and all the covariates used for matching. However, standard errors that ignore the matching step are not valid if matching is conducted with replacement or, more crucially, if the second step regression model is misspecified in the sense indicated above. Moreover, correct specification of the regression model is not required for consistent estimation of treatment effects with matched data. We show that two easily implementable alternatives produce approximations to the distribution of the post-matching estimator that are robust to misspecification. A simulation study and an empirical example demonstrate the empirical relevance of our results. Alberto Abadie, Department of Economics, MIT, abadie@mit.edu. Jann Spiess, Graduate School of Business, Stanford University, jspiess@stanford.edu. We thank Gary King, seminar participants at Harvard, and the editor (Hongyu Zhao) and referees for helpful comments, and Jaume Vives for expert research assistance. Financial support by the NSF through grant SES 0961707 is gratefully acknowledged.

[1]  F. Eicker Limit Theorems for Regressions with Unequal and Dependent Errors , 1967 .

[2]  A. Blinder Wage Discrimination: Reduced Form and Structural Estimates , 1973 .

[3]  Guido W. Imbens,et al.  A Martingale Representation for Matching Estimators , 2009, SSRN Electronic Journal.

[4]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[5]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[6]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[7]  G. Imbens,et al.  Matching on the Estimated Propensity Score , 2009 .

[8]  W G COCHRAN,et al.  Matching in analytical studies. , 1953, American journal of public health and the nation's health.

[9]  M. Arellano,et al.  Computing Robust Standard Errors for Within-Groups Estimators , 2009 .

[10]  H. White Using Least Squares to Approximate Unknown Regression Functions , 1980 .

[11]  D. Rubin Matched Sampling for Causal Effects: Matching to Remove Bias in Observational Studies , 1973 .

[12]  B. Shepherd,et al.  GUIDO IMBENS, DONALD RUBIN, Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. New York: Cambridge University Press. , 2016, Biometrics.

[13]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[14]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[15]  G. Imbens,et al.  On the Failure of the Bootstrap for Matching Estimators , 2006 .

[16]  Gary King,et al.  Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference , 2007, Political Analysis.

[17]  A. Berger FUNDAMENTALS OF BIOSTATISTICS , 1969 .

[18]  Scott T. Weiss,et al.  Longitudinal study of the effects of maternal smoking on pulmonary function in children. , 1983, The New England journal of medicine.

[19]  G. Imbens,et al.  Large Sample Properties of Matching Estimators for Average Treatment Effects , 2004 .

[20]  Guido W. Imbens,et al.  Inference for Misspecified Models With Fixed Regressors , 2014 .

[21]  M. Kahn An Exhalent Problem for Teaching Statistics , 2005 .

[22]  R. Oaxaca Male-Female Wage Differentials in Urban Labor Markets , 1973 .

[23]  H. White A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity , 1980 .

[24]  Thomas Lemieux,et al.  Labor Market Institutions and the Distribution of Wages, 1973-1992: A Semiparametric Approach , 1995 .

[25]  B Rosner,et al.  Effect of parental cigarette smoking on the pulmonary function of children. , 1979, American journal of epidemiology.

[26]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[27]  G. Imbens,et al.  Bias-Corrected Matching Estimators for Average Treatment Effects , 2002 .

[28]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[29]  Bernard R. Rosner,et al.  Fundamentals of Biostatistics. , 1992 .