Feature selection with limited datasets.

Computer-aided diagnosis has the potential of increasing diagnostic accuracy by providing a second reading to radiologists. In many computerized schemes, numerous features can be extracted to describe suspect image regions. A subset of these features is then employed in a data classifier to determine whether the suspect region is abnormal or normal. Different subsets of features will, in general, result in different classification performances. A feature selection method is often used to determine an "optimal" subset of features to use with a particular classifier. A classifier performance measure (such as the area under the receiver operating characteristic curve) must be incorporated into this feature selection process. With limited datasets, however, there is a distribution in the classifier performance measure for a given classifier and subset of features. In this paper, we investigate the variation in the selected subset of "optimal" features as compared with the true optimal subset of features caused by this distribution of classifier performance. We consider examples in which the probability that the optimal subset of features is selected can be analytically computed. We show the dependence of this probability on the dataset sample size, the total number of features from which to select, the number of features selected, and the performance of the true optimal subset. Once a subset of features has been selected, the parameters of the data classifier must be determined. We show that, with limited datasets and/or a large number of features from which to choose, bias is introduced if the classifier parameters are determined using the same data that were employed to select the "optimal" subset of features.

[1]  C E Metz,et al.  Some practical issues of experimental design and data analysis in radiological ROC studies. , 1989, Investigative radiology.

[2]  C. Metz ROC Methodology in Radiologic Imaging , 1986, Investigative radiology.

[3]  K Doi,et al.  Improvement in radiologists' detection of clustered microcalcifications on mammograms. The potential of computer-aided diagnosis. , 1990, Investigative radiology.

[4]  C. Metz,et al.  A receiver operating characteristic partial area index for highly sensitive diagnostic tests. , 1996, Radiology.

[5]  J. M. Pruneda,et al.  Computer-aided mammographic screening for spiculated lesions. , 1994, Radiology.

[6]  Robert M. Nishikawa,et al.  Optimization and FROC analysis of rule-based detection schemes using a multiobjective approach , 1998, IEEE Transactions on Medical Imaging.

[7]  H P Chan,et al.  Image feature selection by a genetic algorithm: application to classification of mass and normal breast tissue. , 1996, Medical physics.

[8]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[9]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  M. Giger,et al.  Improving breast cancer diagnosis with computer-aided diagnosis. , 1999, Academic radiology.

[11]  J. Swets,et al.  Enhanced interpretation of diagnostic images. , 1988, Investigative radiology.

[12]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[13]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognit. Lett..

[14]  I. Olkin,et al.  Selecting and Ordering Populations: A New Statistical Methodology , 1977 .

[15]  Y. Wu,et al.  Artificial neural networks in mammography: application to decision making in the diagnosis of breast cancer. , 1993, Radiology.

[16]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.