Machine Learning for Survival Analysis: A Case Study on Recurrence of Prostate Cancer

Machine learning techniques have recently received considerable attention, especially when used for the construction of prediction models from data. Despite their potential advantages over standard statistical methods, like their ability to model non-linear relationships and construct symbolic and interpretable models, their applications to survival analysis are at best rare, primarily because of the difficulty to appropriately handle censored data. In this paper we propose a schema that enables the use of classification methods--including machine learning classifiers--for survival analysis. To appropriately consider the follow-up time and censoring, we propose a technique that, for the patients for which the event did not occur and have short follow-up times, estimates their probability of event and assigns them a distribution of outcome accordingly. Since most machine learning techniques do not deal with outcome distributions, the schema is implemented using weighted examples. To show the utility of the proposed technique, we investigate a particular problem of building prognostic models for prostate cancer recurrence, where the sole prediction of the probability of event (and not its probability dependency on time) is of interest. A case study on preoperative and postoperative prostate cancer recurrence prediction shows that by incorporating this weighting technique the machine learning tools stand beside modern statistical methods and may, by inducing symbolic recurrence models, provide further insight to relationships within the modeled data.

[1]  N. Lavrac,et al.  Intelligent Data Analysis in Medicine and Pharmacology , 1997 .

[2]  I. Bratko,et al.  Learning decision rules in noisy domains , 1987 .

[3]  D Faraggi,et al.  A neural network model for survival data. , 1995, Statistics in medicine.

[4]  P. Scardino,et al.  The new american joint committee on cancer and international union against cancer tnm classification of prostate cancer , 1994, Cancer.

[5]  Chap T. Le,et al.  Applied Survival Analysis , 1998 .

[6]  W. Catalona,et al.  Artificial neural networks in the diagnosis and prognosis of prostate cancer: a pilot study. , 1994, The Journal of urology.

[7]  A Abu-Hanna,et al.  Prognostic methods in medicine. , 1999, Artificial intelligence in medicine.

[8]  J R Beck,et al.  Experiments to determine whether recursive partitioning (CART) or an artificial neural network overcomes theoretical limitations of Cox proportional hazards regression. , 1998, Computers and biomedical research, an international journal.

[9]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[10]  J. Lubsen,et al.  A Practical Device for the Application of a Diagnostic or Prognostic Function , 1978, Methods of Information in Medicine.

[11]  Haku Ishida,et al.  Applying a Neural Network to Prostate Cancer Survival Data , 1997 .

[12]  F. Harrell,et al.  Artificial neural networks improve the accuracy of cancer survival prediction , 1997, Cancer.

[13]  Calvin L. Williams,et al.  Modern Applied Statistics with S-Plus , 1997 .

[14]  M. Kattan,et al.  A preoperative nomogram for disease recurrence following radical prostatectomy for prostate cancer. , 1998, Journal of the National Cancer Institute.

[15]  J. R. Koehler,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[16]  Brian D. Ripley,et al.  Clinical applications of artificial neural networks: Neural networks as statistical methods in survival analysis , 2001 .

[17]  Ivan Bratko,et al.  Machine learning for survival analysis: a case study on recurrence of prostate cancer , 2000, Artif. Intell. Medicine.

[18]  E Biganzoli,et al.  Feed forward neural networks for the analysis of censored survival data: a partial logistic regression approach. , 1998, Statistics in medicine.

[19]  Ivan Bratko,et al.  On Estimating Probabilities in Tree Pruning , 1991, EWSL.

[20]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[21]  P. Grambsch,et al.  Martingale-based residuals for survival models , 1990 .

[22]  John G. Hughes,et al.  An evaluation of intelligent prognostic systems for colorectal cancer , 1999, Artif. Intell. Medicine.

[23]  D.,et al.  Regression Models and Life-Tables , 2022 .

[24]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[25]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[26]  K. Olesen Prognostic Models in Medicine : Artificial Intelligence and Decision Analytic Approaches , 1999 .

[27]  M. Kattan,et al.  Postoperative nomogram for disease recurrence after radical prostatectomy for prostate cancer. , 1999, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.