Risk-Stratify: Confident Stratification Of Patients Based On Risk

A clinician desires to use a risk-stratification method that achieves confident risk-stratification - the risk estimates of the different patients reflect the true risks with a high probability. This allows him/her to use these risks to make accurate predictions about prognosis and decisions about screening, treatments for the current patient. We develop Risk-stratify - a two phase algorithm that is designed to achieve confident risk-stratification. In the first phase, we grow a tree to partition the covariate space. Each node in the tree is split using statistical tests that determine if the risks of the child nodes are different or not. The choice of the statistical tests depends on whether the data is censored (Log-rank test) or not (U-test). The set of the leaves of the tree form a partition. The risk distribution of patients that belong to a leaf is different from the sibling leaf but not the rest of the leaves. Therefore, some of the leaves that have similar underlying risks are incorrectly specified to have different risks. In the second phase, we develop a novel recursive graph decomposition approach to address this problem. We merge the leaves of the tree that have similar risks to form new leaves that form the final output. We apply Risk-stratify on a cohort of patients (with no history of cardiovascular disease) from UK Biobank and assess their risk for cardiovascular disease. Risk-stratify significantly improves risk-stratification, i.e., a lower fraction of the groups have over/under estimated risks (measured in terms of false discovery rate; 33% reduction) in comparison to state-of-the-art methods for cardiovascular prediction (Random forests, Cox model, etc.). We find that the Cox model significantly over estimates the risk of 21,621 patients out of 216,211 patients. Risk-stratify can accurately categorize 2,987 of these 21,621 patients as low-risk individuals.

[1]  N. Mantel Evaluation of survival data and two new rank order statistics arising in its consideration. , 1966, Cancer chemotherapy reports.

[2]  William T. Abraham,et al.  Risk stratification for in-hospital mortality in acutely decompensated heart failure. Classification and regression tree analysis , 2005 .

[3]  H. Aiandhealt Subtyping : What It Is and Its Role in Precision Medicine , 2015 .

[4]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[5]  G. Moneta Aspirin in the primary and secondary prevention of vascular disease: collaborative meta-analysis of individual participant data from randomised trials , 2010 .

[6]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[7]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[8]  S. Ruhi,et al.  Mixture models for analyzing product reliability data: a case study , 2015, SpringerPlus.

[9]  Suchi Saria,et al.  Clustering Longitudinal Clinical Marker Trajectories from Electronic Health Data: Applications to Phenotyping and Endotype Discovery , 2015, AAAI.

[10]  P. Lønning,et al.  Survival and safety of exemestane versus tamoxifen after 2–3 years' tamoxifen treatment (Intergroup Exemestane Study): a randomised controlled trial , 2007, The Lancet.

[11]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[12]  Gang Chen,et al.  Maximum Margin Dirichlet Process Mixtures for Clustering , 2016, AAAI.

[13]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[14]  Denis Larocque,et al.  A review of survival trees , 2011 .

[15]  Hwanjo Yu,et al.  Discriminative and Distinct Phenotyping by Constrained Tensor Factorization , 2017, Scientific Reports.

[16]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[17]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[18]  M. Gail,et al.  Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. , 1989, Journal of the National Cancer Institute.

[19]  Erika A. Waters,et al.  Development of a Cancer Risk Prediction Tool for Use in the UK Primary Care and Community Settings , 2017, Cancer Prevention Research.

[20]  R. D'Agostino,et al.  Validation of the Framingham coronary heart disease prediction scores: results of a multiple ethnic groups investigation. , 2001, JAMA.

[21]  Jennifer S. Lin,et al.  The Ankle–Brachial Index for Peripheral Artery Disease Screening and Cardiovascular Disease Prediction Among Asymptomatic Adults: A Systematic Evidence Review for the U.S. Preventive Services Task Force , 2013, Annals of Internal Medicine.

[22]  Mihaela van der Schaar,et al.  ConfidentCare: A Clinical Decision Support System for Personalized Breast Cancer Screening , 2016, IEEE Transactions on Multimedia.

[23]  J. Hippisley-Cox,et al.  Derivation and validation of QRISK, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study , 2007, BMJ : British Medical Journal.

[24]  Sylvia Richardson,et al.  PReMiuM: An R Package for Profile Regression Mixture Models Using Dirichlet Processes. , 2013, Journal of statistical software.

[25]  J. Shaffer Multiple Hypothesis Testing , 1995 .