The use of bootstrapping when using propensity-score matching without replacement: a simulation study

Propensity-score matching is frequently used to estimate the effect of treatments, exposures, and interventions when using observational data. An important issue when using propensity-score matching is how to estimate the standard error of the estimated treatment effect. Accurate variance estimation permits construction of confidence intervals that have the advertised coverage rates and tests of statistical significance that have the correct type I error rates. There is disagreement in the literature as to how standard errors should be estimated. The bootstrap is a commonly used resampling method that permits estimation of the sampling variability of estimated parameters. Bootstrap methods are rarely used in conjunction with propensity-score matching. We propose two different bootstrap methods for use when using propensity-score matching without replacementand examined their performance with a series of Monte Carlo simulations. The first method involved drawing bootstrap samples from the matched pairs in the propensity-score-matched sample. The second method involved drawing bootstrap samples from the original sample and estimating the propensity score separately in each bootstrap sample and creating a matched sample within each of these bootstrap samples. The former approach was found to result in estimates of the standard error that were closer to the empirical standard deviation of the sampling distribution of estimated effects.

[1]  D. Rubin,et al.  Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score , 1985 .

[2]  P. Rosenbaum Model-Based Direct Adjustment , 1987 .

[3]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[4]  Per Capita,et al.  About the authors , 1995, Machine Vision and Applications.

[5]  P. Austin The International Journal of Biostatistics Type I Error Rates , Coverage of Confidence Intervals , and Variance Estimation in Propensity-Score Matched Analyses , 2011 .

[6]  G. Imbens,et al.  Matching on the Estimated Propensity Score , 2009 .

[7]  P. Austin Comparing paired vs non-paired statistical methods of analyses when making inferences about absolute risk reductions in propensity-score matched samples , 2011, Statistics in medicine.

[8]  Peter C Austin,et al.  A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study , 2007, Statistics in medicine.

[9]  Elizabeth A Stuart,et al.  Improving propensity score weighting using machine learning , 2010, Statistics in medicine.

[10]  J. Schafer,et al.  Average causal effects from nonrandomized studies: a practical guide and simulated example. , 2008, Psychological methods.

[11]  Peter C. Austin,et al.  A Data-Generation Process for Data with Specified Risk Differences or Numbers Needed to Treat , 2010, Commun. Stat. Simul. Comput..

[12]  Bernard R. Rosner,et al.  Fundamentals of Biostatistics. , 1992 .

[13]  O. Urakawa,et al.  Small - , 2007 .

[14]  Peter C Austin,et al.  The performance of different propensity-score methods for estimating relative risks. , 2008, Journal of clinical epidemiology.

[15]  Peter C Austin,et al.  Report Card on Propensity-Score Matching in the Cardiology Literature From 2004 to 2006: A Systematic Review , 2008, Circulation. Cardiovascular quality and outcomes.

[16]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[17]  Peter C Austin,et al.  The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies , 2010, Statistics in medicine.

[18]  Jerome P. Reiter,et al.  Interval estimation for treatment effects using propensity score matching , 2006, Statistics in medicine.

[19]  P. Austin,et al.  Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies , 2010, Pharmaceutical statistics.

[20]  Peter C Austin,et al.  A comparison of 12 algorithms for matching on the propensity score , 2013, Statistics in medicine.

[21]  Peter C Austin,et al.  The performance of different propensity score methods for estimating marginal hazard ratios , 2007, Statistics in medicine.

[22]  Alan Agresti,et al.  Effects and non‐effects of paired identical observations in comparing proportions with binary matched‐pairs data , 2004, Statistics in medicine.

[23]  G. Imbens,et al.  On the Failure of the Bootstrap for Matching Estimators , 2006 .

[24]  T. Shakespeare,et al.  Observational Studies , 2003 .

[25]  Peter C Austin,et al.  A critical appraisal of propensity‐score matching in the medical literature between 1996 and 2003 , 2008, Statistics in medicine.

[26]  L. J. Wei,et al.  The Robust Inference for the Cox Proportional Hazards Model , 1989 .

[27]  Peter C Austin,et al.  Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement. , 2007, The Journal of thoracic and cardiovascular surgery.

[28]  James Stafford,et al.  The Performance of Two Data-Generation Processes for Data with Specified Marginal Treatment Odds Ratios , 2008, Commun. Stat. Simul. Comput..

[29]  S. Schneeweiss,et al.  Evaluating uses of data mining techniques in propensity score estimation: a simulation study , 2008, Pharmacoepidemiology and drug safety.

[30]  Peter C Austin,et al.  The performance of different propensity score methods for estimating marginal odds ratios, Statistics in Medicine 2007; 26:3078–3094 , 2008 .

[31]  A. Berger FUNDAMENTALS OF BIOSTATISTICS , 1969 .

[32]  B. Hansen Full Matching in an Observational Study of Coaching for the SAT , 2004 .

[33]  Ralf Bender,et al.  Generating survival times to simulate Cox proportional hazards models , 2005, Statistics in medicine.

[34]  R. Porcher,et al.  Propensity score applied to survival data analysis through proportional hazards models: a Monte Carlo study , 2012, Pharmaceutical statistics.

[35]  Paul R. Rosenbaum,et al.  Comparison of Multivariate Matching Methods: Structures, Distances, and Algorithms , 1993 .

[36]  Peter C Austin,et al.  Conditioning on the propensity score can result in biased estimation of common measures of treatment effect: a Monte Carlo study , 2007, Statistics in medicine.