Preliminary investigation on CAD system update: effect of selection of new cases on classifier performance

When a computer-aided diagnosis (CAD) system is used in clinical practice, it is desirable that the system is constantly and automatically updated with new cases obtained for performance improvement. In this study, the effect of different case selection methods for the system updates was investigated. For the simulation, the data for classification of benign and malignant masses on mammograms were used. Six image features were used for training three classifiers: linear discriminant analysis (LDA), support vector machine (SVM), and k-nearest neighbors (kNN). Three datasets, including dataset I for initial training of the classifiers, dataset T for intermediate testing and retraining, and dataset E for evaluating the classifiers, were randomly sampled from the database. As a result of intermediate testing, some cases from dataset T were selected to be added to the previous training set in the classifier updates. In each update, cases were selected using 4 methods: selection of (a) correctly classified samples, (b) incorrectly classified samples, (c) marginally classified samples, and (d) random samples. For comparison, system updates using all samples in dataset T were also evaluated. In general, the average areas under the receiver operating characteristic curves (AUCs) were almost unchanged with method (a), whereas AUCs generally degraded with method (b). The AUCs were improved with method (c) and (d), although use of all available cases generally provided the best or nearly best AUCs. In conclusion, CAD systems may be improved by retraining with new cases accumulated during practice.

[1]  K. Doi,et al.  Investigation of psychophysical measure for evaluation of similar images for mammographic masses: preliminary results. , 2005, Medical physics.

[2]  Marc Boullé,et al.  Bayesian instance selection for the nearest neighbor rule , 2010, Machine Learning.

[3]  Q. Henry Wu,et al.  A class boundary preserving algorithm for data condensation , 2011, Pattern Recognit..

[4]  Jagath C. Rajapakse,et al.  Gene and sample selection for cancer classification with support vectors based t-statistic , 2010, Neurocomputing.

[5]  K. Doi,et al.  Investigation of psychophysical measure for evaluation of similar images for mammographic masses: Preliminary results. , 2005, Medical physics.

[6]  Jordan M. Malof,et al.  Comparative analysis of instance selection algorithms for instance-based classifiers in the context of medical decision support , 2011, Physics in medicine and biology.

[7]  Richard H. Moore,et al.  THE DIGITAL DATABASE FOR SCREENING MAMMOGRAPHY , 2007 .

[8]  Yuichi Motai,et al.  On-line versus off-line accelerated kernel feature analysis: Application to computer-aided detection of polyps in CT colonography , 2010, Signal Process..

[9]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[10]  Karen Drukker,et al.  Enhancement of breast CADx with unlabeled data. , 2010, Medical physics.

[11]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[12]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[13]  Bram van Ginneken,et al.  Active Learning for an Efficient Training Strategy of Computer-Aided Diagnosis Systems: Application to Diabetic Retinopathy Screening , 2010, MICCAI.

[14]  David B. Skalak,et al.  Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.

[15]  Yuhua Li,et al.  Selecting Critical Patterns Based on Local Geometrical and Statistical Information , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.