The use of propensity score methods with survival or time-to-event outcomes: reporting measures of effect similar to those used in randomized experiments

Propensity score methods are increasingly being used to estimate causal treatment effects in observational studies. In medical and epidemiological studies, outcomes are frequently time‐to‐event in nature. Propensity‐score methods are often applied incorrectly when estimating the effect of treatment on time‐to‐event outcomes. This article describes how two different propensity score methods (matching and inverse probability of treatment weighting) can be used to estimate the measures of effect that are frequently reported in randomized controlled trials: (i) marginal survival curves, which describe survival in the population if all subjects were treated or if all subjects were untreated; and (ii) marginal hazard ratios. The use of these propensity score methods allows one to replicate the measures of effect that are commonly reported in randomized controlled trials with time‐to‐event outcomes: both absolute and relative reductions in the probability of an event occurring can be determined. We also provide guidance on variable selection for the propensity score model, highlight methods for assessing the balance of baseline covariates between treated and untreated subjects, and describe the implementation of a sensitivity analysis to assess the effect of unmeasured confounding variables on the estimated treatment effect when outcomes are time‐to‐event in nature. The methods in the paper are illustrated by estimating the effect of discharge statin prescribing on the risk of death in a sample of patients hospitalized with acute myocardial infarction. In this tutorial article, we describe and illustrate all the steps necessary to conduct a comprehensive analysis of the effect of treatment on time‐to‐event outcomes. © 2013 The authors. Statistics in Medicine published by John Wiley & Sons, Ltd.

[1]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[2]  O. Miettinen,et al.  Confounding: essence and detection. , 1981, American journal of epidemiology.

[3]  D. Rubin,et al.  Assessing Sensitivity to an Unobserved Binary Covariate in an Observational Study with Binary Outcome , 1983 .

[4]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[5]  D. Rubin,et al.  Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .

[6]  M. Gail,et al.  Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates , 1984 .

[7]  D. Rubin,et al.  Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score , 1985 .

[8]  S Greenland,et al.  Interpretation and choice of effect measures in epidemiologic analyses. , 1987, American journal of epidemiology.

[9]  P. Rosenbaum Model-Based Direct Adjustment , 1987 .

[10]  D L Sackett,et al.  An assessment of clinically useful measures of the consequences of treatment. , 1988, The New England journal of medicine.

[11]  L. J. Wei,et al.  The Robust Inference for the Cox Proportional Hazards Model , 1989 .

[12]  J. Kalbfleisch,et al.  A Comparison of Cluster-Specific and Population-Averaged Approaches for Analyzing Correlated Binary Data , 1991 .

[13]  Paul R. Rosenbaum,et al.  Comparison of Multivariate Matching Methods: Structures, Distances, and Algorithms , 1993 .

[14]  N Heddle,et al.  Basic statistics for clinicians: 1. Hypothesis testing. , 1995, CMAJ : Canadian Medical Association journal = journal de l'Association medicale canadienne.

[15]  N Heddle,et al.  Basic statistics for clinicians: 3. Assessing the effects of treatment: measures of association. , 1995, CMAJ : Canadian Medical Association journal = journal de l'Association medicale canadienne.

[16]  D. Sackett,et al.  The number needed to treat: a clinically useful measure of treatment effect , 1995, BMJ.

[17]  N. Black CONSORT , 1996, The Lancet.

[18]  T. Shakespeare,et al.  Observational Studies , 2003 .

[19]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[20]  P. Rosenbaum,et al.  Substantial Gains in Bias Reduction from Matching with a Variable Number of Controls , 2000, Biometrics.

[21]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[22]  P D Cleary,et al.  Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: a matched analysis using propensity scores. , 2001, Journal of clinical epidemiology.

[23]  D. Moher,et al.  Use of the CONSORT statement and quality of reports of randomized trials: a comparative before-and-after evaluation. , 2001, JAMA.

[24]  J. Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , 2004, Statistics in medicine.

[25]  B. Hansen Full Matching in an Observational Study of Coaching for the SAT , 2004 .

[26]  M. Mamdani,et al.  Lipid-lowering therapy with statins in high-risk elderly patients: the treatment-risk paradox. , 2004, JAMA.

[27]  Stephen R Cole,et al.  Adjusted survival curves with inverse probability weights , 2004, Comput. Methods Programs Biomed..

[28]  Harold I Feldman,et al.  Model Selection, Confounder Control, and Marginal Structural Models , 2004 .

[29]  Donald B Rubin,et al.  On principles for modeling propensity scores in medical research , 2004, Pharmacoepidemiology and drug safety.

[30]  Jun Yan Survival Analysis: Techniques for Censored and Truncated Data , 2004 .

[31]  Chaofeng Liu,et al.  Adjusted Kaplan–Meier estimator and log‐rank test with inverse probability of treatment weighting for survival data , 2005, Statistics in medicine.

[32]  Bo Lu Propensity Score Matching with Time‐Dependent Covariates , 2005, Biometrics.

[33]  A. Laupacis,et al.  Risk-treatment mismatch in the pharmacotherapy of heart failure. , 2005, JAMA.

[34]  Peter C Austin,et al.  A comparison of propensity score methods: a case‐study estimating the effectiveness of post‐AMI statin use , 2006, Statistics in medicine.

[35]  Peter C Austin,et al.  A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study , 2007, Statistics in medicine.

[36]  Peter C Austin,et al.  Conditioning on the propensity score can result in biased estimation of common measures of treatment effect: a Monte Carlo study , 2007, Statistics in medicine.

[37]  D. Rubin The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials , 2007, Statistics in medicine.

[38]  Peter C Austin,et al.  Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement. , 2007, The Journal of thoracic and cardiovascular surgery.

[39]  Jake Bowers,et al.  Covariate balance in simple stratified and clustered comparative studies , 2008, 0808.3857.

[40]  Angie Wade Matched Sampling for Causal Effects , 2008 .

[41]  Peter C Austin,et al.  Report Card on Propensity-Score Matching in the Cardiology Literature From 2004 to 2006: A Systematic Review , 2008, Circulation. Cardiovascular quality and outcomes.

[42]  Peter C Austin,et al.  A critical appraisal of propensity‐score matching in the medical literature between 1996 and 2003 , 2008, Statistics in medicine.

[43]  P. Austin Goodness‐of‐fit diagnostics for the propensity score model when estimating treatment effects using covariate adjustment with the propensity score , 2008, Pharmacoepidemiology and drug safety.

[44]  Gary King,et al.  Misunderstandings between experimentalists and observationalists about causal inference , 2008 .

[45]  Stephen L. Morgan,et al.  6. A Diagnostic Routine for the Detection of Consequential Heterogeneity of Causal Effects , 2008 .

[46]  Peter C Austin,et al.  Effectiveness of public report cards for improving the quality of cardiac care: the EFFECT study: a randomized trial. , 2009, JAMA.

[47]  Peter C. Austin,et al.  The Relative Ability of Different Propensity Score Methods to Balance Measured Covariates Between Treated and Untreated Subjects in Observational Studies , 2009, Medical decision making : an international journal of the Society for Medical Decision Making.

[48]  P. Austin Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples , 2009, Statistics in medicine.

[49]  P. Austin The International Journal of Biostatistics Type I Error Rates , Coverage of Confidence Intervals , and Variance Estimation in Propensity-Score Matched Analyses , 2011 .

[50]  Peter C Austin,et al.  The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies , 2010, Statistics in medicine.

[51]  Andrea Manca,et al.  A substantial and confusing variation exists in handling of baseline covariates in randomized controlled trials: a review of trials published in leading medical journals. , 2010, Journal of clinical epidemiology.

[52]  D. Moher,et al.  CONSORT 2010 statement: Updated guidelines for reporting parallel group randomised trials , 2010, Journal of pharmacology & pharmacotherapeutics.

[53]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[54]  E. Steyerberg,et al.  [Regression modeling strategies]. , 2011, Revista espanola de cardiologia.

[55]  P. Austin An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies , 2011, Multivariate behavioral research.

[56]  Brian K. Lee,et al.  Weight Trimming and Propensity Score Weighting , 2011, PloS one.

[57]  A. Laupacis,et al.  A Tutorial on Methods to Estimating Clinically and Policy-Meaningful Measures of Treatment Effects in Prospective Observational Studies: A Review , 2011, The international journal of biostatistics.

[58]  P. Austin,et al.  Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies , 2010, Pharmaceutical statistics.

[59]  P. Austin Comparing paired vs non-paired statistical methods of analyses when making inferences about absolute risk reductions in propensity-score matched samples , 2011, Statistics in medicine.

[60]  R. Porcher,et al.  Propensity score applied to survival data analysis through proportional hazards models: a Monte Carlo study , 2012, Pharmaceutical statistics.

[61]  Peter C Austin,et al.  The performance of different propensity score methods for estimating marginal hazard ratios , 2007, Statistics in medicine.

[62]  David P. Harrington,et al.  Linear Rank Tests in Survival Analysis , 2014 .

[63]  J. Haukoos,et al.  The Propensity Score. , 2015, JAMA.

[64]  S. Kruger Design Of Observational Studies , 2016 .

[65]  A. Dreher Modeling Survival Data Extending The Cox Model , 2016 .