An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies

The propensity score is the probability of treatment assignment conditional on observed baseline characteristics. The propensity score allows one to design and analyze an observational (nonrandomized) study so that it mimics some of the particular characteristics of a randomized controlled trial. In particular, the propensity score is a balancing score: conditional on the propensity score, the distribution of observed baseline covariates will be similar between treated and untreated subjects. I describe 4 different propensity score methods: matching on the propensity score, stratification on the propensity score, inverse probability of treatment weighting using the propensity score, and covariate adjustment using the propensity score. I describe balance diagnostics for examining whether the propensity score model has been adequately specified. Furthermore, I discuss differences between regression-based methods and propensity score-based methods for the analysis of observational data. I describe different causal average treatment effects and their relationship with propensity score analyses.

[1]  W. G. Cochran The effectiveness of adjustment by subclassification in removing bias in observational studies. , 1968, Biometrics.

[2]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[3]  W. G. Cochran,et al.  Controlling Bias in Observational Studies: A Review. , 1974 .

[4]  R. Horwitz The planning of observational studies of human populations , 1979 .

[5]  D. Rubin,et al.  Assessing Sensitivity to an Unobserved Binary Covariate in an Observational Study with Binary Outcome , 1983 .

[6]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[7]  D. Rubin,et al.  Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .

[8]  M. Gail,et al.  Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates , 1984 .

[9]  D. Rubin,et al.  Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score , 1985 .

[10]  H. Riedwyl,et al.  Standard Distance in Univariate and Multivariate Analysis , 1986 .

[11]  S Greenland,et al.  Interpretation and choice of effect measures in epidemiologic analyses. , 1987, American journal of epidemiology.

[12]  P. Rosenbaum Model-Based Direct Adjustment , 1987 .

[13]  P. Rosenbaum The Role of a Second Control Group in an Observational Study , 1987 .

[14]  P. Rosenbaum A Characterization of Optimal Designs for Observational Studies , 1991 .

[15]  Paul R. Rosenbaum,et al.  Comparison of Multivariate Matching Methods: Structures, Distances, and Algorithms , 1993 .

[16]  J. Concato,et al.  Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. , 1995, Journal of clinical epidemiology.

[17]  J. Concato,et al.  A simulation study of the number of events per variable in logistic regression analysis. , 1996, Journal of clinical epidemiology.

[18]  Donald Rubin,et al.  Estimating Causal Effects from Large Data Sets Using Propensity Scores , 1997, Annals of Internal Medicine.

[19]  R. D'Agostino Adjustment Methods: Propensity Score Methods for Bias Reduction in the Comparison of a Treatment to a Non‐Randomized Control Group , 2005 .

[20]  T. Shakespeare,et al.  Observational Studies , 2003 .

[21]  G. Shaw,et al.  Maternal pesticide exposure from multiple sources and selected congenital anomalies. , 1999 .

[22]  J. Pearl,et al.  Causal diagrams for epidemiologic research. , 1999, Epidemiology.

[23]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[24]  D. Rubin,et al.  Combining Propensity Score Matching with Additional Adjustments for Prognostic Covariates , 2000 .

[25]  P. Rosenbaum,et al.  Substantial Gains in Bias Reduction from Matching with a Variable Number of Controls , 2000, Biometrics.

[26]  J. Robins,et al.  Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. , 2000, Epidemiology.

[27]  P D Cleary,et al.  Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: a matched analysis using propensity scores. , 2001, Journal of clinical epidemiology.

[28]  J. Robins,et al.  Estimating the causal effect of zidovudine on CD4 count with a marginal structural model for repeated measures , 2002, Statistics in medicine.

[29]  Paul R Rosenbaum,et al.  Rare Outcomes, Common Treatments: Analytic Strategies Using Propensity Scores , 2002, Annals of Internal Medicine.

[30]  Thomas A Louis,et al.  Propensity score modeling strategies for the causal analysis of observational data. , 2002, Biostatistics.

[31]  J. Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , 2004, Statistics in medicine.

[32]  B. Hansen Full Matching in an Observational Study of Coaching for the SAT , 2004 .

[33]  Harold I Feldman,et al.  Model Selection, Confounder Control, and Marginal Structural Models , 2004 .

[34]  D. Rubin Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco Litigation , 2001, Health Services and Outcomes Research Methodology.

[35]  Donald B Rubin,et al.  On principles for modeling propensity scores in medical research , 2004, Pharmacoepidemiology and drug safety.

[36]  D. McCaffrey,et al.  Propensity score estimation with boosted regression for evaluating causal effects in observational studies. , 2004, Psychological methods.

[37]  Peter C Austin,et al.  Propensity score methods gave similar results to traditional regression modeling in observational studies: a systematic review. , 2005, Journal of clinical epidemiology.

[38]  P. Austin,et al.  The use of the propensity score for estimating treatment effects: administrative versus clinical data , 2005, Statistics in medicine.

[39]  Vincent Mor,et al.  Weaknesses of goodness‐of‐fit tests for evaluating propensity score models: the case of the omitted confounder , 2005, Pharmacoepidemiology and drug safety.

[40]  B. Hansen,et al.  Optimal Full Matching and Related Designs via Network Flows , 2006 .

[41]  J. Robins,et al.  Results of multivariable logistic regression, propensity matching, propensity adjustment, and propensity-based weighting under conditions of nonuniform effect. , 2006, American journal of epidemiology.

[42]  Jerome P. Reiter,et al.  Interval estimation for treatment effects using propensity score matching , 2006, Statistics in medicine.

[43]  J. Avorn,et al.  Variable selection for propensity score models. , 2006, American journal of epidemiology.

[44]  Til Stürmer,et al.  A review of the application of propensity score methods yielded increasing use, advantages in specific settings, but not substantially different estimates compared with conventional multivariable methods. , 2006, Journal of clinical epidemiology.

[45]  Peter C Austin,et al.  A comparison of propensity score methods: a case‐study estimating the effectiveness of post‐AMI statin use , 2006, Statistics in medicine.

[46]  Peter C Austin,et al.  A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study , 2007, Statistics in medicine.

[47]  Peter C Austin,et al.  Conditioning on the propensity score can result in biased estimation of common measures of treatment effect: a Monte Carlo study , 2007, Statistics in medicine.

[48]  Peter C Austin,et al.  The performance of different propensity score methods for estimating marginal odds ratios, Statistics in Medicine 2007; 26:3078–3094 , 2008 .

[49]  D. Rubin The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials , 2007, Statistics in medicine.

[50]  Peter C Austin,et al.  Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement. , 2007, The Journal of thoracic and cardiovascular surgery.

[51]  Gary King,et al.  Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference , 2007, Political Analysis.

[52]  Christopher Winship,et al.  Counterfactuals and Causal Inference: Methods and Principles for Social Research , 2007 .

[53]  Peter C. Austin,et al.  A critical appraisal of propensity score matching in the medical literature from 1996 to 2003 , 2008 .

[54]  Guanglei Hong,et al.  Effects of kindergarten retention on children's social-emotional development: an application of propensity score method to multivariate, multilevel data. , 2008, Developmental psychology.

[55]  Angie Wade Matched Sampling for Causal Effects , 2008 .

[56]  Adam E. Wyse,et al.  Assessing the Effects of Small School Size on Mathematics Achievement: A Propensity Score-Matching Approach , 2008, Teachers College Record: The Voice of Scholarship in Education.

[57]  Jasjeet S. Sekhon,et al.  Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching Package for R , 2008 .

[58]  Peter C. Austin,et al.  A report card on propensity-score matching in the cardiology literature from 2004 to 2006: results of a systematic review , 2008 .

[59]  Peter C Austin,et al.  Report Card on Propensity-Score Matching in the Cardiology Literature From 2004 to 2006: A Systematic Review , 2008, Circulation. Cardiovascular quality and outcomes.

[60]  Peter C Austin,et al.  A critical appraisal of propensity‐score matching in the medical literature between 1996 and 2003 , 2008, Statistics in medicine.

[61]  Peter C Austin,et al.  The performance of different propensity-score methods for estimating relative risks. , 2008, Journal of clinical epidemiology.

[62]  P. Austin Goodness‐of‐fit diagnostics for the propensity score model when estimating treatment effects using covariate adjustment with the propensity score , 2008, Pharmacoepidemiology and drug safety.

[63]  Gary King,et al.  Misunderstandings between experimentalists and observationalists about causal inference , 2008 .

[64]  P. Austin,et al.  Assessing balance in measured baseline covariates when using many‐to‐one matching on the propensity‐score , 2008, Pharmacoepidemiology and drug safety.

[65]  J. Schafer,et al.  Average causal effects from nonrandomized studies: a practical guide and simulated example. , 2008, Psychological methods.

[66]  Megan E. Patrick,et al.  Teenage alcohol use and educational attainment. , 2008, Journal of studies on alcohol and drugs.

[67]  S. Schneeweiss,et al.  Evaluating uses of data mining techniques in propensity score estimation: a simulation study , 2008, Pharmacoepidemiology and drug safety.

[68]  Stephen L. Morgan,et al.  6. A Diagnostic Routine for the Detection of Consequential Heterogeneity of Causal Effects , 2008 .

[69]  Peter C. Austin,et al.  Using the Standardized Difference to Compare the Prevalence of a Binary Variable Between Two Groups in Observational Research , 2009, Commun. Stat. Simul. Comput..

[70]  Peter C. Austin,et al.  The Relative Ability of Different Propensity Score Methods to Balance Measured Covariates Between Treated and Untreated Subjects in Observational Studies , 2009, Medical decision making : an international journal of the Society for Medical Decision Making.

[71]  C.J.H. Mann,et al.  Clinical Prediction Models: A Practical Approach to Development, Validation and Updating , 2009 .

[72]  P. Austin Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples , 2009, Statistics in medicine.

[73]  Yu Ye,et al.  Using propensity scores to adjust for selection bias when assessing the effectiveness of Alcoholics Anonymous in observational studies. , 2009, Drug and alcohol dependence.

[74]  P. Austin The International Journal of Biostatistics Type I Error Rates , Coverage of Confidence Intervals , and Variance Estimation in Propensity-Score Matched Analyses , 2011 .

[75]  Elizabeth A Stuart,et al.  Improving propensity score weighting using machine learning , 2010, Statistics in medicine.

[76]  Peter C Austin,et al.  The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies , 2010, Statistics in medicine.

[77]  Douglas E. Faries,et al.  Analysis of Observational Health Care Data Using SAS , 2010 .

[78]  Gary King,et al.  MatchIt: Nonparametric Preprocessing for Parametric Causal Inference , 2011 .

[79]  P. Austin,et al.  Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies , 2010, Pharmaceutical statistics.

[80]  Peter C. Austin,et al.  A Tutorial and Case Study in Propensity Score Analysis: An Application to Estimating the Effect of In-Hospital Smoking Cessation Counseling on Mortality , 2011, Multivariate behavioral research.

[81]  P. Austin Comparing paired vs non-paired statistical methods of analyses when making inferences about absolute risk reductions in propensity-score matched samples , 2011, Statistics in medicine.

[82]  Peter C Austin,et al.  The performance of different propensity score methods for estimating marginal hazard ratios , 2007, Statistics in medicine.