Veridical causal inference using propensity score methods for comparative effectiveness research with medical claims

Medical insurance claims are becoming increasingly common data sources to answer a variety of questions in biomedical research. Although comprehensive in terms of longitudinal characterization of disease development and progression for a potentially large number of patients, population-based inference using these datasets require thoughtful modifications to sample selection and analytic strategies relative to other types of studies. Along with complex selection bias and missing data issues, claims-based studies are purely observational, which limits effective understanding and characterization of the treatment differences between groups being compared. All these issues contribute to a crisis in reproducibility and replication of comparative findings using medical claims. This paper offers practical guidance to the analytical process, demonstrates methods for estimating causal treatment effects with propensity score methods for several types of outcomes common to such studies, such as binary, count, time to event and longitudinally varying measures, and also aims to increase transparency and reproducibility of reporting of results from these investigations. We provide an online version of the paper with readily implementable code for the entire analysis pipeline to serve as a guided tutorial for practitioners. The online version can be accessed at https://rydaro.github.io/. The analytic pipeline is illustrated using a sub-cohort of patients with advanced prostate cancer from the large Clinformatics TM Data Mart Database (OptumInsight, Eden Prairie, Minnesota), consisting of 73 million distinct private payer insures from 2001 to 2016.

[1]  Xiao-Hua Zhou,et al.  The use of propensity scores in pharmacoepidemiologic research , 2000, Pharmacoepidemiology and drug safety.

[2]  P. Austin Assessing the performance of the generalized propensity score for estimating the effect of quantitative or continuous exposures on binary outcomes , 2018, Statistics in medicine.

[3]  Z. Obermeyer,et al.  Identification of Emergency Department Visits in Medicare Administrative Claims: Approaches and Implications , 2017, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[4]  Lihui Zhao,et al.  Predicting the restricted mean event time with the subject's baseline covariates in survival analysis. , 2014, Biostatistics.

[5]  M S Cepeda,et al.  The use of propensity scores in pharmacoepidemiologic research , 2000, Pharmacoepidemiology and drug safety.

[6]  T. VanderWeele Principles of confounder selection , 2019, European Journal of Epidemiology.

[7]  Til Stürmer,et al.  Variable selection for propensity score models when estimating treatment effects on multiple outcomes: a simulation study , 2013, Pharmacoepidemiology and drug safety.

[8]  G. Imbens,et al.  The Propensity Score with Continuous Treatments , 2005 .

[9]  C. Steiner,et al.  Comorbidity measures for use with administrative data. , 1998, Medical care.

[10]  K. Imai,et al.  Covariate balancing propensity score , 2014 .

[11]  Ø. Lidegaard Epidemiologic research using administrative databases: garbage in, garbage out. , 2011, Obstetrics and gynecology.

[12]  Gary King,et al.  Nonparametric Preprocessing for Parametric Causal Inference [R package MatchIt version 4.1.0] , 2020 .

[13]  Isaac Dialsingh,et al.  Applied Bayesian Modeling and Causal Inference from Incomplete Data Perspectives , 2005 .

[14]  Peter Davey,et al.  A checklist for retrospective database studies--report of the ISPOR Task Force on Retrospective Databases. , 2003, Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research.

[15]  Jacques LeLorier,et al.  Using Healthcare Claims Data for Outcomes Research and Pharmacoeconomic Analyses , 1999, PharmacoEconomics.

[16]  M. Berry,et al.  Reporting and Guidelines in Propensity Score Analysis: A Systematic Review of Cancer and Cancer Surgical Studies , 2017, Journal of the National Cancer Institute.

[17]  S. Pocock,et al.  The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies , 2007, The Lancet.

[18]  Daniel Westreich,et al.  Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. , 2010, Journal of clinical epidemiology.

[19]  Paul R Rosenbaum,et al.  Rare Outcomes, Common Treatments: Analytic Strategies Using Propensity Scores , 2002, Annals of Internal Medicine.

[20]  D. Rubin,et al.  Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score , 1985 .

[21]  Maja Pohar Perme,et al.  Pseudo-observations in survival analysis , 2010, Statistical methods in medical research.

[22]  F. Gaita,et al.  Use and misuse of multivariable approaches in interventional cardiology studies on drug-eluting stents: a systematic review. , 2012, Journal of interventional cardiology.

[23]  P. Austin,et al.  Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies , 2010, Pharmaceutical statistics.

[24]  Ahmedin Jemal,et al.  Cancer Disparities by Race/Ethnicity and Socioeconomic Status , 2004, CA: a cancer journal for clinicians.

[25]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[26]  Patrick Royston,et al.  Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome , 2013, BMC Medical Research Methodology.

[27]  Gary King,et al.  Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference , 2007, Political Analysis.

[28]  Michael L. Johnson,et al.  Good research practices for comparative effectiveness research: analytic methods to improve causal inference from nonrandomized studies of treatment effects using secondary data sources: the ISPOR Good Research Practices for Retrospective Database Analysis Task Force Report--Part III. , 2009, Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research.

[29]  Y. Susanti,et al.  M ESTIMATION, S ESTIMATION, AND MM ESTIMATION IN ROBUST REGRESSION , 2014 .

[30]  L. Trinquart,et al.  Adjusted restricted mean survival times in observational studies , 2019, Statistics in medicine.

[31]  James R. Rogers,et al.  Comparative effectiveness of generic and brand-name medication use: A database study of US health insurance claims , 2019, PLoS medicine.

[32]  David Madigan,et al.  Good practices for real‐world data studies of treatment and/or comparative effectiveness: Recommendations from the joint ISPOR‐ISPE Special Task Force on real‐world evidence in health care decision making , 2017, Pharmacoepidemiology and drug safety.

[33]  A. Kiss,et al.  A Review of Propensity-Score Methods and Their Use in Cardiovascular Research. , 2016, The Canadian journal of cardiology.

[34]  Elizabeth A Stuart,et al.  Improving propensity score weighting using machine learning , 2010, Statistics in medicine.

[35]  Andrea J Cook,et al.  Safety surveillance and the estimation of risk in select populations: Flexible methods to control for confounding while targeting marginal comparisons via standardization , 2019, Statistics in medicine.

[36]  John P A Ioannidis,et al.  Assessment of vibration of effects due to model specification can demonstrate the instability of observational associations. , 2015, Journal of clinical epidemiology.

[37]  Brian K. Lee,et al.  Weight Trimming and Propensity Score Weighting , 2011, PloS one.

[38]  D. Rubin,et al.  Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .

[39]  D. Rubin,et al.  Estimating and Using Propensity Scores with Partially Missing Data , 2000 .

[40]  S. Pocock,et al.  Das Strengthening the Reporting of Observational Studies in Epidemiology (STROBE-) Statement , 2008, Notfall + Rettungsmedizin.

[41]  J. B. Layton,et al.  Propensity Score Methods for Confounding Control in Nonexperimental Research , 2013, Circulation. Cardiovascular quality and outcomes.

[42]  L. Stefanski,et al.  The Calculus of M-Estimation , 2002 .

[43]  Vincent Mor,et al.  Principles for modeling propensity scores in medical research: a systematic literature review , 2004, Pharmacoepidemiology and drug safety.

[44]  Harold I Feldman,et al.  Model Selection, Confounder Control, and Marginal Structural Models , 2004 .

[45]  J. Robins A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect , 1986 .

[46]  Tyler J. VanderWeele,et al.  Sensitivity Analysis in Observational Research: Introducing the E-Value , 2017, Annals of Internal Medicine.

[47]  R A Stephenson,et al.  Racial and ethnic differences in advanced-stage prostate cancer: the Prostate Cancer Outcomes Study. , 2001, Journal of the National Cancer Institute.

[48]  Stephen R Cole,et al.  Constructing inverse probability weights for marginal structural models. , 2008, American journal of epidemiology.

[49]  Stephen L. Morgan,et al.  6. A Diagnostic Routine for the Detection of Consequential Heterogeneity of Causal Effects , 2008 .

[50]  L. Chen,et al.  Provider Specialty, Anticoagulation Prescription Patterns, and Stroke Risk in Atrial Fibrillation , 2018, Journal of the American Heart Association.

[51]  Alan R. Ellis,et al.  The role of prediction modeling in propensity score estimation: an evaluation of logistic regression, bCART, and the covariate-balancing propensity score. , 2014, American journal of epidemiology.

[52]  Kari Lock Morgan,et al.  Balancing Covariates via Propensity Score Weighting , 2014, 1609.07494.

[53]  J. Avorn,et al.  Variable selection for propensity score models. , 2006, American journal of epidemiology.

[54]  James M Robins,et al.  Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available. , 2016, American journal of epidemiology.

[55]  D. Grimes Epidemiologic research using administrative databases: garbage in, garbage out. , 2010, Obstetrics and gynecology.

[56]  D. Rubin,et al.  The bias due to incomplete matching. , 1983, Biometrics.

[57]  P. Austin,et al.  Comparative Effectiveness of Generic Atorvastatin and Lipitor® in Patients Hospitalized with an Acute Coronary Syndrome , 2016, Journal of the American Heart Association.

[58]  Robert Andersen Modern Methods for Robust Regression , 2007 .

[59]  Yoshiaki Uyama,et al.  Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis. , 2014, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[60]  Renu Gehring,et al.  Administrative healthcare data : a guide to its origin, content, and application using SAS , 2014 .

[61]  Bonnie K. Lind,et al.  Challenges of Using Medical Insurance Claims Data for Utilization Analysis , 2006, American journal of medical quality : the official journal of the American College of Medical Quality.

[62]  D. Meier,et al.  Methods for constructing and assessing propensity scores. , 2014, Health services research.

[63]  P. Austin Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples , 2009, Statistics in medicine.

[64]  Donald Steinwachs,et al.  Estimating Causal Effects in Observational Studies using Electronic Health Data: Challenges and (Some) Solutions , 2013, EGEMS.

[65]  Peter C. Austin,et al.  The Relative Ability of Different Propensity Score Methods to Balance Measured Covariates Between Treated and Untreated Subjects in Observational Studies , 2009, Medical decision making : an international journal of the Society for Medical Decision Making.

[66]  P. Austin An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies , 2011, Multivariate behavioral research.

[67]  J. Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , 2004, Statistics in medicine.

[68]  Ryan D. Ross,et al.  Factors Associated With Use of Sipuleucel-T to Treat Patients With Advanced Prostate Cancer , 2019, JAMA network open.

[69]  P. Austin,et al.  Assessing balance in measured baseline covariates when using many‐to‐one matching on the propensity‐score , 2008, Pharmacoepidemiology and drug safety.

[70]  Bin Yu,et al.  Three principles of data science: predictability, computability, and stability (PCS) , 2019 .

[71]  Peter C Austin,et al.  A critical appraisal of propensity‐score matching in the medical literature between 1996 and 2003 , 2008, Statistics in medicine.

[72]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[73]  Rolf H H Groenwold,et al.  Reporting of covariate selection and balance assessment in propensity score analysis is suboptimal: a systematic review. , 2015, Journal of clinical epidemiology.

[74]  R. Califf,et al.  Real-World Evidence - What Is It and What Can It Tell Us? , 2016, The New England journal of medicine.

[75]  Jared K Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. , 2017, Statistics in medicine.

[76]  K. Tabb,et al.  Association of Mood and Anxiety Disorders and Opioid Prescription Patterns Among Postpartum Women. , 2020, The American journal on addictions.

[77]  Wei Lang,et al.  Examining the Impact of Missing Data on Propensity Score Estimation in Determining the Effectiveness of Self-Monitoring of Blood Glucose (SMBG) , 2001, Health Services and Outcomes Research Methodology.

[78]  R. Forshee,et al.  Zostavax vaccine effectiveness among US elderly using real‐world evidence: Addressing unmeasured confounders by using multiple imputation after linking beneficiary surveys with Medicare claims , 2019, Pharmacoepidemiology and drug safety.

[79]  D. Penson,et al.  Racial variation in the pattern and quality of care for prostate cancer in the USA: mind the gap , 2010, BJU international.

[80]  J. Avorn,et al.  A review of uses of health care utilization databases for epidemiologic research on therapeutics. , 2005, Journal of clinical epidemiology.

[81]  D B Rubin,et al.  Matching using estimated propensity scores: relating theory to practice. , 1996, Biometrics.

[82]  P D Cleary,et al.  Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: a matched analysis using propensity scores. , 2001, Journal of clinical epidemiology.

[83]  Peter C Austin,et al.  A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study , 2007, Statistics in medicine.

[84]  S. Schneeweiss,et al.  Evaluating uses of data mining techniques in propensity score estimation: a simulation study , 2008, Pharmacoepidemiology and drug safety.

[85]  Sherri Rose,et al.  Implementation of G-computation on a simulated data set: demonstration of a causal inference technique. , 2011, American journal of epidemiology.

[86]  P. Rosenbaum Model-Based Direct Adjustment , 1987 .

[87]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .