Generalizing observational study results: applying propensity score methods to complex surveys.

OBJECTIVE To provide a tutorial for using propensity score methods with complex survey data. DATA SOURCES Simulated data and the 2008 Medical Expenditure Panel Survey. STUDY DESIGN Using simulation, we compared the following methods for estimating the treatment effect: a naïve estimate (ignoring both survey weights and propensity scores), survey weighting, propensity score methods (nearest neighbor matching, weighting, and subclassification), and propensity score methods in combination with survey weighting. Methods are compared in terms of bias and 95 percent confidence interval coverage. In Example 2, we used these methods to estimate the effect on health care spending of having a generalist versus a specialist as a usual source of care. PRINCIPAL FINDINGS In general, combining a propensity score method and survey weighting is necessary to achieve unbiased treatment effect estimates that are generalizable to the original survey target population. CONCLUSIONS Propensity score methods are an essential tool for addressing confounding in observational studies. Ignoring survey weights may lead to results that are not generalizable to the survey target population. This paper clarifies the appropriate inferences for different propensity score methods and suggests guidelines for selecting an appropriate propensity score method based on a researcher's goal.

[1]  W. G. Cochran The effectiveness of adjustment by subclassification in removing bias in observational studies. , 1968, Biometrics.

[2]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[3]  E L Korn,et al.  Epidemiologic studies utilizing surveys: accounting for the sampling design. , 1991, American journal of public health.

[4]  D. Pfeffermann The Role of Sampling Weights when Modeling Survey Data , 1993 .

[5]  J. Robins,et al.  Effect of highly active antiretroviral therapy on time to acquired immunodeficiency syndrome or death using marginal structural models. , 2003, American journal of epidemiology.

[6]  J. Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , 2004, Statistics in medicine.

[7]  D. McCaffrey,et al.  Propensity score estimation with boosted regression for evaluating causal effects in observational studies. , 2004, Psychological methods.

[8]  Elaine L. Zanutto,et al.  Using Propensity Score Subclassification for Multiple Treatment Doses to Evaluate a National Antidrug Media Campaign , 2005 .

[9]  Peter C Austin,et al.  A comparison of propensity score methods: a case‐study estimating the effectiveness of post‐AMI statin use , 2006, Statistics in medicine.

[10]  Andrew Gelman,et al.  Struggles with survey weighting and regression modeling , 2007, 0710.5005.

[11]  A. Zaslavsky,et al.  Adjusting for health status in non-linear models of health care disparities , 2009, Health Services and Outcomes Research Methodology.

[12]  Gary King,et al.  Misunderstandings between experimentalists and observationalists about causal inference , 2008 .

[13]  Stephen R Cole,et al.  Inverse probability‐of‐censoring weights for the correction of time‐varying noncompliance in the effect of randomized highly active antiretroviral therapy on incident AIDS or death , 2009, Statistics in medicine.

[14]  G. L’italien,et al.  Humanistic and economic impacts of hepatitis C infection in the United States , 2010, Journal of medical economics.

[15]  S. Cole,et al.  Using marginal structural measurement-error models to estimate the long-term effect of antiretroviral therapy on incident AIDS or death. , 2010, American journal of epidemiology.

[16]  J. Tilford,et al.  Associations of Family-Centered Care with Health Care Outcomes for Children with Special Health Care Needs , 2011, Maternal and Child Health Journal.

[17]  Bethany R. Lee,et al.  Comparing three years of well-being outcomes for youth in group care and nonkinship foster care. , 2010, Child welfare.

[18]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[19]  G. Fitzmaurice,et al.  Witness of Intimate Partner Violence in Childhood and Perpetration of Intimate Partner Violence in Adulthood , 2010, Epidemiology.

[20]  Gary King,et al.  MatchIt: Nonparametric Preprocessing for Parametric Causal Inference , 2011 .

[21]  Qun G. Jiao,et al.  Use of design effects and sample weights in complex health survey data: a review of published articles using data from 3 commonly used adolescent health surveys. , 2012, American journal of public health.