Using multiple imputation to classify potential outcomes subgroups

With medical tests becoming increasingly available, concerns about over-testing and over-treatment dramatically increase. Hence, it is important to understand the influence of testing on treatment selection in general practice. Most statistical methods focus on average effects of testing on treatment decisions. However, this may be ill-advised, particularly for patient subgroups that tend not to benefit from such tests. Furthermore, missing data are common, representing large and often unaddressed threats to the validity of statistical methods. Finally, it is desirable to conduct analyses that can be interpreted causally. We propose to classify patients into four potential outcomes subgroups, defined by whether or not a patient's treatment selection is changed by the test result and by the direction of how the test result changes treatment selection. This subgroup classification naturally captures the differential influence of medical testing on treatment selections for different patients, which can suggest targets to improve the utilization of medical tests. We can then examine patient characteristics associated with patient potential outcomes subgroup memberships. We used multiple imputation methods to simultaneously impute the missing potential outcomes as well as regular missing values. This approach can also provide estimates of many traditional causal quantities. We find that explicitly incorporating causal inference assumptions into the multiple imputation process can improve the precision for some causal estimates of interest. We also find that bias can occur when the potential outcomes conditional independence assumption is violated; sensitivity analyses are proposed to assess the impact of this violation. We applied the proposed methods to examine the influence of 21-gene assay, the most commonly used genomic test, on chemotherapy selection among breast cancer patients.

[1]  D. Rubin,et al.  Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse , 1986 .

[2]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[3]  A. Kurian,et al.  Chemotherapy decisions and patient experience with the recurrence score assay for early‐stage breast cancer , 2017, Cancer.

[4]  R. Schilsky,et al.  American Society of Clinical Oncology 2013 top five list in oncology. , 2013, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[5]  C. Gross,et al.  Trends and clinical implications of preoperative breast MRI in Medicare beneficiaries with breast cancer , 2013, Breast Cancer Research and Treatment.

[6]  Xihong Lin,et al.  Estimating causal effects in trials involving multitreatment arms subject to non‐compliance: a Bayesian framework , 2010, Journal of the Royal Statistical Society. Series C, Applied statistics.

[7]  T. Raghunathan,et al.  Multiple Imputation for Causal Inference , 2010 .

[8]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[9]  Susan Halabi,et al.  American Joint Committee on Cancer acceptance criteria for inclusion of risk models for individualized prognosis in the practice of precision medicine , 2016, CA: a cancer journal for clinicians.

[10]  Virginia G Kaklamani,et al.  Adjuvant Chemotherapy Guided by a 21‐Gene Expression Assay in Breast Cancer , 2018, The New England journal of medicine.

[11]  Jack Cuzick,et al.  Prediction of risk of distant recurrence using the 21-gene recurrence score in node-negative and node-positive postmenopausal patients with breast cancer treated with anastrozole or tamoxifen: a TransATAC study. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[12]  Donald B. Rubin,et al.  Estimation of Causal Effects via Principal Stratification When Some Outcomes are Truncated by “Death” , 2003 .

[13]  R. Little,et al.  Penalized Spline of Propensity Methods for Treatment Comparison , 2019, Journal of the American Statistical Association.

[14]  Michael R Elliott,et al.  A Bayesian Approach to Surrogacy Assessment Using Principal Stratification in Clinical Trials , 2010, Biometrics.

[15]  R. Jagsi,et al.  The influence of 21-gene recurrence score assay on chemotherapy use in a population-based sample of breast cancer patients , 2017, Breast Cancer Research and Treatment.

[16]  Zhiwei Zhang,et al.  Assessing the heterogeneity of treatment effects via potential outcomes of individual patients. , 2013, Journal of the Royal Statistical Society. Series C, Applied statistics.

[17]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[18]  Timothy L Lash,et al.  A most stubborn bias: no adjustment method fully resolves confounding by indication in observational studies. , 2010, Journal of clinical epidemiology.

[19]  T. VanderWeele Controlled Direct and Mediated Effects: Definition, Identification and Bounds , 2010, Scandinavian journal of statistics, theory and applications.

[20]  J. Robins The control of confounding by intermediate variables. , 1989, Statistics in medicine.

[21]  M R Elliott,et al.  Surrogacy assessment using principal stratification and a Gaussian copula model , 2017, Statistical methods in medical research.

[22]  Joseph W Hogan,et al.  Bayesian Inference for the Causal Effect of Mediation , 2012, Biometrics.

[23]  P. Holland Statistics and Causal Inference , 1985 .

[24]  Donald B Rubin,et al.  Principal Stratification for Causal Inference With Extended Partial Compliance , 2008 .

[25]  K. Goddard,et al.  Utilization of HER2 genetic testing in a multi-institutional observational study. , 2012, The American journal of managed care.

[26]  Fan Li,et al.  Causal Inference: A Missing Data Perspective , 2017, 1712.06170.

[27]  P. Royston,et al.  Patrick Royston model with a binary outcome A comparison of imputation techniques for handling missing predictor values in a risk , 2007 .

[28]  Jeremy MG Taylor,et al.  Causal assessment of surrogacy in a meta-analysis of colorectal cancer trials. , 2011, Biostatistics.

[29]  B. Stricker,et al.  Confounding by indication: an example of variation in the use of epidemiologic terminology. , 1999, American journal of epidemiology.

[30]  Donald B. Rubin,et al.  Bayesian Inference for Causal Effects: The Role of Randomization , 1978 .

[31]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[32]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.

[33]  T. Eberlein A Multigene Assay to Predict Recurrence of Tamoxifen-Treated, Node-Negative Breast Cancer , 2006 .

[34]  X H Zhou,et al.  Multiple Imputation Methods for Treatment Noncompliance and Nonresponse in Randomized Clinical Trials , 2009, Biometrics.

[35]  Arthur B. Kennickell,et al.  Imputation of the 1989 Survey of Consumer Finances: Stochastic Relaxation and Multiple Imputation , 1997 .

[36]  Julian Wolfson,et al.  Statistical Identifiability and the Surrogate Endpoint Problem, with Application to Vaccine Trials , 2010, Biometrics.

[37]  H. Goldstein,et al.  Variance partitioning in multilevel logistic models that exhibit overdispersion , 2005 .

[38]  Stef van Buuren,et al.  Multiple imputation of discrete and continuous data by fully conditional specification , 2007 .

[39]  D. Allison,et al.  Treatment Heterogeneity and Individual Qualitative Interaction , 2012, The American statistician.

[40]  Michael R Elliott,et al.  Joint modeling compliance and outcome for causal analysis in longitudinal studies. , 2014, Statistics in medicine.

[41]  Michael R Elliott,et al.  Bayesian inference for causal mediation effects using principal stratification with dichotomous mediators and outcomes. , 2010, Biostatistics.

[42]  B. Efron,et al.  Compliance as an Explanatory Variable in Clinical Trials , 1991 .

[43]  J. Pearl,et al.  Bounds on Treatment Effects from Studies with Imperfect Compliance , 1997 .

[44]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[45]  J. Wolfson,et al.  Design and Estimation for Evaluating Principal Surrogate Markers in Vaccine Trials , 2013, Biometrics.

[46]  Pier Luigi Conti,et al.  How far from identifiability? A systematic overview of the statistical matching problem in a non parametric framework , 2017 .

[47]  Robert Gray,et al.  Prognostic utility of the 21-gene assay in hormone receptor-positive operable breast cancer compared with classical clinicopathologic features. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[48]  Jeremy MG Taylor,et al.  Surrogacy assessment using principal stratification when surrogate and outcome measures are multivariate normal. , 2014, Biostatistics.

[49]  Geert Molenberghs,et al.  On the relationship between the causal‐inference and meta‐analytic paradigms for the validation of surrogate endpoints , 2015, Biometrics.

[50]  Sebastian Vollmer,et al.  Testing for heterogeneous treatment effects in experimental data: false discovery risks and correction procedures , 2014 .

[51]  Xiaoming Li,et al.  A Multiple Imputation Approach for the Evaluation of Surrogate Markers in the Principal Stratification Causal Inference Framework , 2013 .

[52]  Roee Gutman,et al.  Estimation of causal effects of binary treatments in unconfounded studies , 2015, Statistics in medicine.

[53]  D. Rubin Randomization Analysis of Experimental Data: The Fisher Randomization Test Comment , 1980 .

[54]  Stephen R Cole,et al.  Imputation approaches for potential outcomes in causal inference. , 2015, International journal of epidemiology.

[55]  R. Uzzo,et al.  Latent Class Survival Models Linked by Principal Stratification to Investigate Heterogenous Survival Subgroups Among Individuals With Early-Stage Kidney Cancer , 2017, Journal of the American Statistical Association.

[56]  Dylan S. Small,et al.  Bounds on causal effects in three‐arm trials with non‐compliance , 2006 .

[57]  R. Bast,et al.  American Society of Clinical Oncology 2007 update of recommendations for the use of tumor markers in breast cancer. , 2007, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[58]  Trivellore Raghunathan,et al.  Graphical and numerical diagnostic tools to assess suitability of multiple imputations and imputation models , 2016, Statistics in medicine.

[59]  S. Shak,et al.  A population-based study of tumor gene expression and risk of breast cancer death among lymph node-negative patients , 2006, Breast Cancer Research.

[60]  Robert B Livingston,et al.  Prognostic and predictive value of the 21-gene recurrence score assay in postmenopausal women with node-positive, oestrogen-receptor-positive breast cancer on chemotherapy: a retrospective analysis of a randomised trial. , 2010, The Lancet. Oncology.

[61]  K. McGlynn,et al.  Chemotherapy Use and Survival Among Young and Middle-Aged Patients With Colon Cancer , 2017, JAMA surgery.

[62]  Joseph W Hogan,et al.  Principal stratification with predictors of compliance for randomized trials with 2 active treatments. , 2008, Biostatistics.