Bias in Cross-Entropy-Based Training of Deep Survival Networks

Over the last years, utilizing deep learning for the analysis of survival data has become attractive to many researchers. This has led to the advent of numerous network architectures for the prediction of possibly censored time-to-event variables. Unlike networks for cross-sectional data (used e.g. in classification), deep survival networks require the specification of a suitably defined loss function that incorporates typical characteristics of survival data such as censoring and time-dependent features. Here we provide an in-depth analysis of the cross-entropy loss function, which is a popular loss function for training deep survival networks. For each time point t, the cross-entropy loss is defined in terms of a binary outcome with levels "event at or before t" and "event after t". Using both theoretical and empirical approaches, we show that this definition may result in a high prediction error and a heavy bias in the predicted survival probabilities. To overcome this problem, we analyze an alternative loss function that is derived from the negative log-likelihood function of a discrete time-to-event model. We show that replacing the cross-entropy loss by the negative log-likelihood loss results in much better calibrated prediction rules and also in an improved discriminatory power, as measured by the concordance index.