Sample size and validation issues on the development of CAD systems

Abstract Classifier design and performance validation are important steps in the development of computer-aided diagnosis (CAD) systems. Within a CAD system, one or more classifiers may be used at various stages to differentiate malignant and benign lesions, or to differentiate true lesions from false positives. A classifier is trained with case samples drawn from the patient population. The performance of the trained classifier on unknown samples depends on the quality (whether the training samples are statistically representative of the patient population) and the quantity (sample size) of the training samples. To evaluate the performance of the classifier (or the CAD system), an independent set of test samples that have not been seen by the classifier (unknown samples) should be used. Because the available samples with ground truth are often limited in medical imaging research, the finite sample size is a limiting factor in the development of CAD systems. In this talk, we will review some of the issues associated with classifier design and validation under the constraint of finite sample size.