Assessing flexible models and rule extraction from censored survival data

The evaluation of generic non-linear models for censored data needs to address the two complementary requirements in the software development life-cycle, of validation and verification. The former involves making a rigorous assessment of predictive accuracy in prognostic modelling and the latter is interpreted in this paper as comprising two different stages, namely model selection and rule-based interpretation of the composition of prognostic risk groups. With reference to prognostic performance is survival modelling the well-known ROC framework has recently been extended to a threshold independent, time-dependent performance index to quantify the predictive accuracy of censored data models, termed the C' index, which is briefly described. The rule-based framework for direct validation of risk group allocation against expert domain knowledge, uses low-order Boolean rules to approximate the response surfaces generated by analytical inference models. In the case of censored data, this approach serves to characterise the allocation of patients into risk groups generated by a risk staging index. Furthermore, the low-order rules define low-dimensional sub-spaces where individual data points can be directly visualised in relation to the decision boundaries for their risk group. Taken together, the quantitative performance index, Boolean explanatory rules and direct visualisation of the data, define a consistent and transparent validation framework based on triangulation of information. This information can be included in decision support systems.