The propensity score--the probability of exposure to a specific treatment conditional on observed variables--is increasingly being used in observational studies. Creating strata in which subjects are matched on the propensity score allows one to balance measured variables between treated and untreated subjects. There is an ongoing controversy in the literature as to which variables to include in the propensity score model. Some advocate including those variables that predict treatment assignment, while others suggest including all variables potentially related to the outcome, and still others advocate including only variables that are associated with both treatment and outcome. We provide a case study of the association between drug exposure and mortality to show that including a variable that is related to treatment, but not outcome, does not improve balance and reduces the number of matched pairs available for analysis. In order to investigate this issue more comprehensively, we conducted a series of Monte Carlo simulations of the performance of propensity score models that contained variables related to treatment allocation, or variables that were confounders for the treatment-outcome pair, or variables related to outcome or all variables related to either outcome or treatment or neither. We compared the use of these different propensity scores models in matching and stratification in terms of the extent to which they balanced variables. We demonstrated that all propensity scores models balanced measured confounders between treated and untreated subjects in a propensity-score matched sample. However, including only the true confounders or the variables predictive of the outcome in the propensity score model resulted in a substantially larger number of matched pairs than did using the treatment-allocation model. Stratifying on the quintiles of any propensity score model resulted in residual imbalance between treated and untreated subjects in the upper and lower quintiles. Greater balance between treated and untreated subjects was obtained after matching on the propensity score than after stratifying on the quintiles of the propensity score. When a confounding variable was omitted from any of the propensity score models, then matching or stratifying on the propensity score resulted in residual imbalance in prognostically important variables between treated and untreated subjects. We considered four propensity score models for estimating treatment effects: the model that included only true confounders; the model that included all variables associated with the outcome; the model that included all measured variables; and the model that included all variables associated with treatment selection. Reduction in bias when estimating a null treatment effect was equivalent for all four propensity score models when propensity score matching was used. Reduction in bias was marginally greater for the first two propensity score models than for the last two propensity score models when stratification on the quintiles of the propensity score model was employed. Furthermore, omitting a confounding variable from the propensity score model resulted in biased estimation of the treatment effect. Finally, the mean squared error for estimating a null treatment effect was lower when either of the first two propensity scores was used compared to when either of the last two propensity score models was used.
[1]
R. Deyo,et al.
Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases.
,
1992,
Journal of clinical epidemiology.
[2]
D. Rubin.
Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco Litigation
,
2001,
Health Services and Outcomes Research Methodology.
[3]
M. Gail,et al.
Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates
,
1984
.
[4]
Peter C Austin,et al.
Propensity score methods gave similar results to traditional regression modeling in observational studies: a systematic review.
,
2005,
Journal of clinical epidemiology.
[5]
Alan Agresti,et al.
Effects and non‐effects of paired identical observations in comparing proportions with binary matched‐pairs data
,
2004,
Statistics in medicine.
[6]
Vincent Mor,et al.
Principles for modeling propensity scores in medical research: a systematic literature review
,
2004,
Pharmacoepidemiology and drug safety.
[7]
Lon S Schneider,et al.
Risk of death with atypical antipsychotic drug treatment for dementia: meta-analysis of randomized placebo-controlled trials.
,
2005,
JAMA.
[8]
D. Rubin,et al.
The central role of the propensity score in observational studies for causal effects
,
1983
.
[9]
Vincent Mor,et al.
Weaknesses of goodness‐of‐fit tests for evaluating propensity score models: the case of the omitted confounder
,
2005,
Pharmacoepidemiology and drug safety.
[10]
P D Cleary,et al.
Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: a matched analysis using propensity scores.
,
2001,
Journal of clinical epidemiology.
[11]
D B Rubin,et al.
Matching using estimated propensity scores: relating theory to practice.
,
1996,
Biometrics.
[12]
Xiao-Hua Zhou,et al.
The use of propensity scores in pharmacoepidemiologic research
,
2000,
Pharmacoepidemiology and drug safety.
[13]
D. Rubin,et al.
Reducing Bias in Observational Studies Using Subclassification on the Propensity Score
,
1984
.
[14]
P. Austin,et al.
The use of the propensity score for estimating treatment effects: administrative versus clinical data
,
2005,
Statistics in medicine.
[15]
R. D'Agostino.
Adjustment Methods: Propensity Score Methods for Bias Reduction in the Comparison of a Treatment to a Non‐Randomized Control Group
,
2005
.