High‐dimensional propensity scores for empirical covariate selection in secondary database studies: Planning, implementation, and reporting

Real‐world evidence used for regulatory, payer, and clinical decision‐making requires principled epidemiology in design and analysis, applying methods to minimize confounding given the lack of randomization. One technique to deal with potential confounding is propensity score (PS) analysis, which allows for the adjustment for measured preexposure covariates. Since its first publication in 2009, the high‐dimensional propensity score (hdPS) method has emerged as an approach that extends traditional PS covariate selection to include large numbers of covariates that may reduce confounding bias in the analysis of healthcare databases. hdPS is an automated, data‐driven analytic approach for covariate selection that empirically identifies preexposure variables and proxies to include in the PS model. This article provides an overview of the hdPS approach and recommendations on the planning, implementation, and reporting of hdPS used for causal treatment‐effect estimations in longitudinal healthcare databases. We supply a checklist with key considerations as a supportive decision tool to aid investigators in the implementation and transparent reporting of hdPS techniques, and to aid decision‐makers unfamiliar with hdPS in the understanding and interpretation of studies employing this approach. This article is endorsed by the International Society for Pharmacoepidemiology.

[1]  N. Pratt,et al.  Antibiotic utilisation in primary and revision total hip replacement patients: A registry linkage cohort study of 106 253 patients using the Australian Orthopaedic Association National Joint Replacement Registry , 2022, Pharmacoepidemiology and drug safety.

[2]  L. Smeeth,et al.  Transparency of high‐dimensional propensity score analyses: Guidance for diagnostics and reporting , 2022, Pharmacoepidemiology and drug safety.

[3]  Asad Haris,et al.  A Targeted Approach to Confounder Selection for High-Dimensional Data , 2021, 2112.08495.

[4]  J. Rassen,et al.  Effectiveness of the Single-Dose Ad26.COV2.S COVID Vaccine , 2021, medRxiv.

[5]  S. Schneeweiss,et al.  STaRT-RWE: structured template for planning and reporting on the implementation of real world evidence studies , 2021, BMJ.

[6]  Alan R. Ellis,et al.  Using propensity scores to estimate effects of treatment initiation decisions: State of the science , 2020, Statistics in medicine.

[7]  L. Smeeth,et al.  Implementing high‐dimensional propensity score principles to improve confounder adjustment in UK electronic health records , 2020, Pharmacoepidemiology and drug safety.

[8]  M. Ishimaru Introduction to High-dimensional Propensity Score Analysis , 2020, Annals of Clinical Epidemiology.

[9]  J. Benichou,et al.  Comparative Real-Life Effectiveness and Safety of Dabigatran or Rivaroxaban vs. Vitamin K Antagonists: A High-Dimensional Propensity Score Matched New Users Cohort Study in the French National Healthcare Data System SNDS , 2019, American Journal of Cardiovascular Drugs.

[10]  J. Benichou,et al.  Comparative Effectiveness and Safety of Standard or Reduced Dose Dabigatran vs. Rivaroxaban in Nonvalvular Atrial Fibrillation , 2019, Clinical pharmacology and therapeutics.

[11]  H. Yasunaga,et al.  Association between perioperative oral care and postoperative pneumonia after cancer resection: conventional versus high-dimensional propensity score matching analysis , 2019, Clinical Oral Investigations.

[12]  J. Benichou,et al.  Effectiveness and safety of 110 or 150 mg dabigatran vs. vitamin K antagonists in nonvalvular atrial fibrillation , 2018, British journal of clinical pharmacology.

[13]  Antoine Chambaz,et al.  Scalable collaborative targeted learning for high-dimensional data , 2017, Statistical methods in medical research.

[14]  Sebastian Schneeweiss,et al.  Automated data-adaptive analytics for electronic healthcare data to study causal treatment effects , 2018, Clinical epidemiology.

[15]  Martijn J Schuemie,et al.  Evaluating large-scale propensity score performance through real-world and synthetic data experiments. , 2018, International journal of epidemiology.

[16]  Kenneth Rockwood,et al.  Measuring Frailty in Medicare Data: Development and Validation of a Claims-Based Frailty Index , 2018, The journals of gerontology. Series A, Biological sciences and medical sciences.

[17]  Mohammad Ehsanul Karim,et al.  Can We Train Machine Learning Methods to Outperform the High-dimensional Propensity Score Algorithm? , 2017, Epidemiology.

[18]  E. Garbe,et al.  The Potential of High‐Dimensional Propensity Scores in Health Services Research: An Exemplary Study on the Quality of Care for Elective Percutaneous Coronary Interventions , 2018, Health services research.

[19]  Cheng Ju,et al.  Using Super Learner Prediction Modeling to Improve High-dimensional Propensity Score Estimation , 2018, Epidemiology.

[20]  Olaf Klungel,et al.  Reporting to Improve Reproducibility and Facilitate Validity Assessment for Healthcare Database Studies V1.0. , 2017, Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research.

[21]  David Madigan,et al.  Good practices for real‐world data studies of treatment and/or comparative effectiveness: Recommendations from the joint ISPOR‐ISPE Special Task Force on real‐world evidence in health care decision making , 2017, Pharmacoepidemiology and drug safety.

[22]  Nicholas Moore,et al.  The national healthcare system claims databases in France, SNIIRAM and EGB: Powerful tools for pharmacoepidemiology , 2017, Pharmacoepidemiology and drug safety.

[23]  S. Hernández-Díaz,et al.  Matching Weights to Simultaneously Compare Three Treatment Groups: Comparison to Three-way Matching , 2017, Epidemiology.

[24]  Krista F. Huybrechts,et al.  A Propensity-score-based Fine Stratification Approach for Confounding Adjustment When Exposure Is Infrequent , 2017, Epidemiology.

[25]  J. Hallas,et al.  Performance of the High‐dimensional Propensity Score in a Nordic Healthcare Model , 2017, Basic & clinical pharmacology & toxicology.

[26]  Sebastian Schneeweiss,et al.  Variable Selection for Confounding Adjustment in High-dimensional Covariate Spaces When Analyzing Healthcare Databases , 2017, Epidemiology.

[27]  Peter M. Steiner,et al.  The Mechanics of Omitted Variable Bias: Bias Amplification and Cancellation of Offsetting Biases , 2016, Journal of causal inference.

[28]  E. Rahme,et al.  Performance of the high-dimensional propensity score in adjusting for unmeasured confounders , 2016, European Journal of Clinical Pharmacology.

[29]  Jacques LeLorier,et al.  Head to head comparison of the propensity score and the high-dimensional propensity score matching methods , 2016, BMC Medical Research Methodology.

[30]  A. Arana,et al.  Guide on methodological standards in pharmacoepidemiology , 2016 .

[31]  Sebastian Schneeweiss,et al.  Regularized Regression Versus the High-Dimensional Propensity Score for Confounding Adjustment in Secondary Database Analyses. , 2015, American journal of epidemiology.

[32]  Robert W. Platt,et al.  On the role of marginal confounder prevalence – implications for the high‐dimensional propensity score algorithm , 2015, Pharmacoepidemiology and drug safety.

[33]  Romain Neugebauer,et al.  High‐dimensional propensity score algorithm in comparative effectiveness research with time‐varying interventions , 2015, Statistics in medicine.

[34]  O. Klungel,et al.  Propensity score methods and unobserved covariate imbalance: comments on "squeezing the balloon". , 2014, Health services research.

[35]  Jeremy A Rassen,et al.  Metrics for covariate balance in cohort studies of causal effects , 2014, Statistics in medicine.

[36]  John M Brooks,et al.  Squeezing the balloon: propensity scores and unmeasured covariate balance. , 2013, Health services research.

[37]  Scott R. Smith,et al.  Developing a Protocol for Observational Comparative Effectiveness Research: A User's Guide , 2013 .

[38]  I. Pigeot,et al.  High-dimensional versus conventional propensity scores in a comparative effectiveness study of coxibs and reduced upper gastrointestinal complications , 2013, European Journal of Clinical Pharmacology.

[39]  S. Schneeweiss,et al.  Practice of Epidemiology Implications of M Bias in Epidemiologic Studies: a Simulation Study , 2022 .

[40]  Romain Neugebauer,et al.  Causal inference in epidemiological studies with strong confounding , 2012, Statistics in medicine.

[41]  Sebastian Schneeweiss,et al.  Using high‐dimensional propensity scores to automate confounding control in a distributed medical product safety surveillance system , 2012, Pharmacoepidemiology and drug safety.

[42]  I. Shpitser,et al.  A New Criterion for Confounder Selection , 2011, Biometrics.

[43]  J. Myers,et al.  Effects of adjusting for instrumental variables on bias and precision of effect estimates. , 2011, American journal of epidemiology.

[44]  Sengwee Toh,et al.  Confounding adjustment via a semi‐automated high‐dimensional propensity score algorithm: an application to electronic medical records , 2011, Pharmacoepidemiology and drug safety.

[45]  M Alan Brookhart,et al.  Covariate selection in high-dimensional propensity score analyses of treatment effects in small samples. , 2011, American journal of epidemiology.

[46]  M Alan Brookhart,et al.  The implications of propensity score variable selection strategies in pharmacoepidemiology: an empirical illustration , 2011, Pharmacoepidemiology and drug safety.

[47]  J. Avorn,et al.  Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution--a simulation study. , 2010, American journal of epidemiology.

[48]  Sebastian Schneeweiss,et al.  Instrumental variable methods in comparative safety and effectiveness research , 2010, Pharmacoepidemiology and drug safety.

[49]  M Alan Brookhart,et al.  Instrumental variables I: instrumental variables exploit natural variation in nonexperimental data to estimate causal relationships. , 2009, Journal of clinical epidemiology.

[50]  P. Austin Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples , 2009, Statistics in medicine.

[51]  J. Avorn,et al.  High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Health Care Claims Data , 2009, Epidemiology.

[52]  J. Avorn,et al.  Variable selection for propensity score models. , 2006, American journal of epidemiology.

[53]  Til Stürmer,et al.  Indications for propensity scores and review of their use in pharmacoepidemiology. , 2006, Basic & clinical pharmacology & toxicology.

[54]  J. Robins,et al.  A Structural Approach to Selection Bias , 2004, Epidemiology.

[55]  M Soledad Cepeda,et al.  Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders. , 2003, American journal of epidemiology.

[56]  M Maclure,et al.  Performance of comorbidity scores to control for confounding in epidemiologic studies using claims data. , 2001, American journal of epidemiology.

[57]  J. Robins,et al.  Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. , 2000, Epidemiology.

[58]  J. Pearl,et al.  Causal diagrams for epidemiologic research. , 1999, Epidemiology.

[59]  J. Concato,et al.  A simulation study of the number of events per variable in logistic regression analysis. , 1996, Journal of clinical epidemiology.

[60]  J. Robins,et al.  Estimating exposure effects by modelling the expectation of exposure conditional on confounders. , 1992, Biometrics.

[61]  D. Rubin,et al.  Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .

[62]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[63]  I. Bross Spurious effects from an extraneous variable. , 1966, Journal of chronic diseases.