Cluster-Boosted Multi-Task Learning Framework for Survival Analysis

Accurately predicting the time to an event of interest is an important problem in a wide range of real-world applications. However, prediction is often difficult because many medical datasets have a large number of unlabeled (“censored”) instances because labeling is costly and time consuming. Survival analysis focuses on labeled data to predict the time to an event of interest, such as time of death, or conversion to a different stage in a progressive disease. Grouping structure, which naturally exists in medical datasets, can be exploited to improve generalization performance by learning multiple related survival prediction tasks for subgroups collaboratively. Thus a multi-task learning framework can connect multiple survival prediction tasks (for different subgroups) and learn them simultaneously. In order to take into account both censored information, as well as discover the grouping structure, we propose a novel cluster-boosted multitask learning framework for survival analysis that boosts survival prediction performance. We develop an efficient algorithm and demonstrate the performance of the proposed cluster-boosted multi-task survival analysis method on The Cancer Genome Atlas (TCGA) dataset. Our results show that the proposed approach can significantly improve prediction performance in survival analysis while also identifying different subgroups of cancer patients.

[1]  Aleksandra Jovicic,et al.  Can Cluster-Boosted Regression Improve Prediction of Death and Length of Stay in the ICU? , 2017, IEEE Journal of Biomedical and Health Informatics.

[2]  Jiayu Zhou,et al.  Clustered Multi-Task Learning Via Alternating Structure Optimization , 2011, NIPS.

[3]  M. Saeed Multiparameter Intelligent Monitoring in Intensive Care II ( MIMIC-II ) : A public-access intensive care unit database , 2011 .

[4]  Hongzhe Li,et al.  In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[5]  Jieping Ye,et al.  A Multi-Task Learning Formulation for Survival Analysis , 2016, KDD.

[6]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[7]  Monika Kriner,et al.  Survival Analysis with Multivariate adaptive Regression Splines , 2007 .

[8]  Russell Greiner,et al.  Learning Patient-Specific Cancer Survival Distributions as a Sequence of Dependent Regressors , 2011, NIPS.

[9]  Fei Wang,et al.  From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records , 2014, KDD.

[10]  Jiayu Zhou,et al.  A multi-task learning formulation for predicting disease progression , 2011, KDD.

[11]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[12]  D.,et al.  Regression Models and Life-Tables , 2022 .

[13]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[14]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[15]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[16]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.

[17]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[18]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[19]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[20]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[21]  Jian Huang,et al.  BMC Bioinformatics BioMed Central Methodology article Supervised group Lasso with applications to microarray data , 2007 .

[22]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[23]  Elisa T. Lee,et al.  Statistical Methods for Survival Data Analysis , 1994, IEEE Transactions on Reliability.

[24]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.