Ensemble survival trees for identifying subpopulations in personalized medicine.

Recently, personalized medicine has received great attention to improve safety and effectiveness in drug development. Personalized medicine aims to provide medical treatment that is tailored to the patient's characteristics such as genomic biomarkers, disease history, etc., so that the benefit of treatment can be optimized. Subpopulations identification is to divide patients into several different subgroups where each subgroup corresponds to an optimal treatment. For two subgroups, traditionally the multivariate Cox proportional hazards model is fitted and used to calculate the risk score when outcome is survival time endpoint. Median is commonly chosen as the cutoff value to separate patients. However, using median as the cutoff value is quite subjective and sometimes may be inappropriate in situations where data are imbalanced. Here, we propose a novel tree-based method that adopts the algorithm of relative risk trees to identify subgroup patients. After growing a relative risk tree, we apply k-means clustering to group the terminal nodes based on the averaged covariates. We adopt an ensemble Bagging method to improve the performance of a single tree since it is well known that the performance of a single tree is quite unstable. A simulation study is conducted to compare the performance between our proposed method and the multivariate Cox model. The applications of our proposed method to two public cancer data sets are also conducted for illustration.

[1]  A. Ciampi,et al.  Stratification by stepwise regression, correspondence analysis and recursive partition: A comparison of three methods of analysis for survival data with covaria , 1986 .

[2]  R. Simon,et al.  The Cross-Validated Adaptive Signature Design , 2010, Clinical Cancer Research.

[3]  Claudio Conversano,et al.  Combining an Additive and Tree-Based Regression Model Simultaneously: STIMA , 2010 .

[4]  K. Hornik,et al.  Model-Based Recursive Partitioning , 2008 .

[5]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[6]  R. Olshen,et al.  Tree-structured survival analysis. , 1985, Cancer treatment reports.

[7]  J. Minna,et al.  A 12-Gene Set Predicts Survival Benefits from Adjuvant Chemotherapy in Non–Small Cell Lung Cancer Patients , 2013, Clinical Cancer Research.

[8]  Iven Van Mechelen,et al.  A comparison of five recursive partitioning methods to find person subgroups involved in meaningful treatment–subgroup interactions , 2013, Advances in Data Analysis and Classification.

[9]  R. Temple,et al.  Enrichment designs: efficiency in development of cancer treatments. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[10]  Denis Larocque,et al.  A review of survival trees , 2011 .

[11]  Daniel J Sargent,et al.  Clinical trial designs for predictive marker validation in cancer treatment trials. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[12]  Igor Jurisica,et al.  Prognostic and predictive gene signature for adjuvant chemotherapy in resected non-small-cell lung cancer. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[13]  Xiaogang Su,et al.  Subgroup Analysis via Recursive Partitioning , 2009 .

[14]  M. LeBlanc,et al.  Relative risk trees for censored survival data. , 1992, Biometrics.

[15]  J. M. Taylor,et al.  Subgroup identification from randomized clinical trial data , 2011, Statistics in medicine.

[16]  R. Simon,et al.  Adaptive Signature Design: An Adaptive Clinical Trial Design for Generating and Prospectively Testing A Gene Expression Signature for Sensitive Patients , 2005, Clinical Cancer Research.

[17]  Seng-Jaw Soong,et al.  An Integrated Tree-Based Classification Approach to Prognostic Grouping with Application to Localized Melanoma Patients , 2007, Journal of biopharmaceutical statistics.

[18]  John Crowley,et al.  Developing and Validating Continuous Genomic Signatures in Randomized Clinical Trials for Predictive Medicine , 2012, Clinical Cancer Research.

[19]  Daniel J Sargent,et al.  Clinical Trial Designs for Predictive Biomarker Validation: One Size Does Not Fit All , 2009, Journal of biopharmaceutical statistics.

[20]  R B Davis,et al.  Exponential survival trees. , 1989, Statistics in medicine.

[21]  Ralf Bender,et al.  Generating survival times to simulate Cox proportional hazards models , 2005, Statistics in medicine.

[22]  A. Nádas On Estimating the Distribution of a Random Vector When Only the Smallest Coordinate Is Observable , 1970 .

[23]  Mark R. Segal,et al.  Regression Trees for Censored Data , 1988 .

[24]  Daniel J Sargent,et al.  Clinical trial designs for predictive biomarker validation: theoretical considerations and practical challenges. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[25]  Dung-Tsa Chen,et al.  Prognostic and predictive value of a malignancy-risk gene signature in early-stage non-small cell lung cancer. , 2011, Journal of the National Cancer Institute.

[26]  Richard Simon,et al.  Evaluation of randomized discontinuation design. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[27]  I. Lipkovich,et al.  Subgroup identification based on differential effect search—A recursive partitioning method for establishing response to treatment in patient subpopulations , 2011, Statistics in medicine.