Active Learning based Survival Regression for Censored Data

Time-to-event outcomes based data can be modelled using survival regression methods which can predict these outcomes in different censored data applications in diverse fields such as engineering, economics and healthcare. Predictive models are built by inferring from the censored variable in time-to-event data, which differentiates them from other regression methods. Censoring is represented as a binary indicator variable and machine learning methods have been tuned to account for the censored attribute. Active learning from censored data using survival regression methods can make the model query a domain expert for the time-to-event label of the sampled instances. This offers higher advantages in the healthcare domain where a domain expert can interactively refine the model with his feedback. With this motivation, we address this problem by providing an active learning based survival model which uses a novel model discriminative gradient based sampling scheme. We evaluate this framework on electronic health records (EHR), publicly available survival and synthetic censored datasets of varying diversity. Experimental evaluation against state of the art survival regression methods indicates the higher discriminative ability of the proposed approach. We also present the sampling results for the proposed approach in an active learning setting which indicate better learning rates in comparison to other sampling strategies.

[1]  C. Yancy,et al.  Relationship between early physician follow-up and 30-day readmission among Medicare beneficiaries hospitalized for heart failure. , 2010, JAMA.

[2]  J. V. Ryzin,et al.  Regression Analysis with Randomly Right-Censored Data , 1981 .

[3]  David W. Hosmer,et al.  Applied Survival Analysis: Regression Modeling of Time-to-Event Data , 2008 .

[4]  P. Schmidt,et al.  Survival analysis: A survey , 1991 .

[5]  Matthias Schmid,et al.  Boosting the Concordance Index for Survival Data – A Unified Framework To Derive and Evaluate Biomarker Combinations , 2013, PloS one.

[6]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[7]  Ludger Evers,et al.  Sparse kernel methods for high-dimensional survival data , 2008, Bioinform..

[8]  D.,et al.  Regression Models and Life-Tables , 2022 .

[9]  John A Spertus,et al.  Confirmation of a heart failure epidemic: findings from the Resource Utilization Among Congestive Heart Failure (REACH) study. , 2002, Journal of the American College of Cardiology.

[10]  Bhanukiran Vinzamuri,et al.  Cox Regression with Correlation Based Regularization for Electronic Health Records , 2013, 2013 IEEE 13th International Conference on Data Mining.

[11]  Elia Biganzoli,et al.  A general framework for neural network models on censored survival data , 2002, Neural Networks.

[12]  Balaji Krishnapuram,et al.  On Ranking in Survival Analysis: Bounds on the Concordance Index , 2007, NIPS.

[13]  M. Gonen,et al.  Concordance probability and discriminatory power in proportional hazards regression , 2005 .

[14]  Faisal M. Khan,et al.  Support Vector Regression for Censored Data (SVRc): A Novel Tool for Survival Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[15]  Robert D. Nowak,et al.  Faster Rates in Regression via Active Learning , 2005, NIPS.

[16]  Lee-Jen Wei,et al.  The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. , 1992, Statistics in medicine.

[17]  Xiaohui Xie,et al.  A Gradient Boosting Algorithm for Survival Analysis via Direct Optimization of Concordance Index , 2013, Comput. Math. Methods Medicine.

[18]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[19]  R. Kay,et al.  Proportional Hazard Regression Models and the Analysis of Censored Survival Data , 1977 .

[20]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[21]  H. Zou,et al.  A cocktail algorithm for solving the elastic net penalized Cox’s regression in high dimensions , 2013 .

[22]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[23]  Lyle H. Ungar,et al.  Machine Learning manuscript No. (will be inserted by the editor) Active Learning for Logistic Regression: , 2007 .

[24]  Hongzhe Li,et al.  Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data , 2005, Bioinform..

[25]  P. Sasieni,et al.  Cox Regression Model , 2005 .

[26]  Shahram Ebadollahi,et al.  Toward personalized care management of patients at risk: the diabetes case study , 2011, KDD.

[27]  Jun Yan Survival Analysis: Techniques for Censored and Truncated Data , 2004 .

[28]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[29]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[30]  Amanda H. Salanitro,et al.  Risk prediction models for hospital readmission: a systematic review. , 2011, JAMA.

[31]  May,et al.  [Wiley Series in Probability and Statistics] Applied Survival Analysis (Regression Modeling of Time-to-Event Data) || Extensions of the Proportional Hazards Model , 2008 .