Agreement between two versions of a CADx system: a simulation study

A simulation study was conducted to investigate the agreement between original and updated versions of a computeraided diagnosis (CADx) system. Performances of two versions of a CADx system are traditionally compared using metrics derived from the receiver operating characteristic (ROC) curve. These aggregate standalone performance measures may reveal the overall improvement of the CADx system due to the update, but do not provide information about the specific change in CADx output for individual cases. To address this issue, we used the concordance measure, which compares the ordering of scores for pairs of cases between system versions (i.e., before and after the update of the system). In this preliminary study, the system update that we investigated was an enlargement of the training data set, which is often encountered in the development of a subsequent CADx system version for improving performance. We separately studied the effect of the size of the original training set, the number of features, and the distribution and separation of the two classes in the feature space on the concordance and AUC measures. When the effect of an update was compared among datasets with differences in intrinsic class separation, concordance was in general larger when the intrinsic class separation was larger. The amount of change in AUC between the original and updated CADx system did not always predict the degree of agreement between the two system versions. A large improvement in AUC could be accompanied with either a larger or smaller agreement between the original and updated systems. Quantification of the degree of agreement in standalone performance between different versions of a CADx system may serve to define a major algorithm update, and better depict the impact of that update.