Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images

This paper describes a new hierarchical approach to content-based image retrieval called the "customized-queries" approach (CQA). Contrary to the single feature vector approach which tries to classify the query and retrieve similar images in one step, CQA uses multiple feature sets and a two-step approach to retrieval. The first step classifies the query according to the class labels of the images using the features that best discriminate the classes. The second step then retrieves the most similar images within the predicted class using the features customized to distinguish "subclasses" within that class. Needing to find the customized feature subset for each class led us to investigate feature selection for unsupervised learning. As a result, we developed a new algorithm called FSSEM (feature subset selection using expectation-maximization clustering). We applied our approach to a database of high resolution computed tomography lung images and show that CQA radically improves the retrieval precision over the single feature vector approach. To determine whether our CBIR system is helpful to physicians, we conducted an evaluation trial with eight radiologists. The results show that our system using CQA retrieval doubled the doctors' diagnostic accuracy.

[1]  Carla E. Brodley,et al.  Feature selection for unsupervised learning applied to content-based image retrieval , 2001 .

[2]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[3]  B. S. Manjunath,et al.  Texture features and learning similarity , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[5]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Carla E. Brodley,et al.  ASSERT: A Physician-in-the-Loop Content-Based Retrieval System for HRCT Image Databases , 1999, Comput. Vis. Image Underst..

[7]  John C. Dalton,et al.  Similarity pyramids for browsing and organization of large image databases , 1998, Electronic Imaging.

[8]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[9]  Paul A. Viola,et al.  Boosting Image Retrieval , 2004, International Journal of Computer Vision.

[10]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[11]  J. Kittler,et al.  Feature Set Search Alborithms , 1978 .

[12]  Tom Minka,et al.  Interactive learning with a "Society of Models" , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[14]  Claire Cardie,et al.  Using Decision Trees to Improve Case-Based Learning , 1993, ICML.

[15]  Carla E. Brodley,et al.  Feature Subset Selection and Order Identification for Unsupervised Learning , 2000, ICML.

[16]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[17]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[18]  David P. Naidich,et al.  High-Resolution CT of the Lung , 1997 .

[19]  Ashwin Ram,et al.  Efficient Feature Selection in Conceptual Clustering , 1997, ICML.

[20]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[21]  Marco La Cascia,et al.  Image Digestion and Relevance Feedback in the ImageRover WWW Search Engine , 1997 .

[22]  Carla E. Brodley,et al.  Customized-queries approach to CBIR , 1998, Electronic Imaging.

[23]  Rosalind W. Picard,et al.  Interactive Learning Using a "Society of Models" , 2017, CVPR 1996.

[24]  Myron Flickner,et al.  Query by Image and Video Content , 1995 .

[25]  Shivakumar Vaithyanathan,et al.  Model Selection in Unsupervised Learning with Applications To Document Clustering , 1999, International Conference on Machine Learning.

[26]  Alex Pentland,et al.  Photobook: Content-based manipulation of image databases , 1996, International Journal of Computer Vision.

[27]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[28]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[29]  Carla E. Brodley,et al.  Testing for human perceptual categories in a physician-in-the-loop CBIR system for medical imagery , 1999, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL'99).

[30]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[31]  Padhraic Smyth,et al.  Clustering Using Monte Carlo Cross-Validation , 1996, KDD.

[32]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[33]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[34]  Luis Talavera,et al.  Feature Selection as a Preprocessing Step for Hierarchical Clustering , 1999, ICML.

[35]  J. Wolfe PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS. , 1970, Multivariate behavioral research.

[36]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[37]  Alex Pentland,et al.  Photobook: tools for content-based manipulation of image databases , 1994, Electronic Imaging.

[38]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[39]  Carla E. Brodley,et al.  The customized-queries approach to CBIR using EM , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[40]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[41]  Thomas S. Huang,et al.  Optimizing learning in image retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[42]  Paul S. Bradley,et al.  Initialization of Iterative Refinement Clustering Algorithms , 1998, KDD.

[43]  Carla E. Brodley,et al.  ASSERT: A PHYSICIAN-IN-THE-LOOP CONTENT-BASED IMAGE RETRIEVAL SYSTEM FOR HRCT IMAGE DATABASES , 1999 .