Model-based classification of visual information for content-based retrieval

Most existing approaches to content-based retrieval rely on query by example, or user sketch based on low-level features. However, these are not suitable for semantic (object level) distinctions. In other approaches, information is classified according to a predefined set of classes and classification is either performed manually or by using class-specific algorithms. Most of these systems lack flexibility: the user does not have the ability to define or change the classes, and new classification schemes require implementation of new class-specific algorithms and/or the input of an expert. In this paper, we present a different approach to content-based retrieval and a novel framework for classification of visual information, in which (1) users define their own visual classes and classifiers are learned automatically, and (multiple fuzzy-classifiers and machine learning techniques are combined for automatic classification at multiple levels (region, perceptual, object-part, object and scene). We present The Visual Apprentice, an implementation of our framework for still images and video that uses a combination of lazy-learning, decision trees, and evolution programs for classification and grouping. Our system is flexible, in that models can be changed by users over time, different types of classifiers are combined, and user-model definitions can be applied to object and scene structure classification. Special emphasis is placed on the difference between semantic and visual classes, and between classification and detection. Examples and results are presented to demonstrate the applicability of our approach to perform visual classification and detection.

[1]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Euripides G. M. Petrakis,et al.  Similarity Searching in Large Image DataBases , 1994 .

[3]  Andrew W. Moore,et al.  Efficient Algorithms for Minimizing Cross Validation Error , 1994, ICML.

[4]  Thomas P. Minka,et al.  An image database browser that learns from user interaction , 1996 .

[5]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[6]  Shih-Fu Chang,et al.  Video object model and segmentation for content-based video indexing , 1997, Proceedings of 1997 IEEE International Symposium on Circuits and Systems. Circuits and Systems in the Information Age ISCAS '97.

[7]  David W. Aha,et al.  Lazy Learning , 1997, Springer Netherlands.

[8]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[9]  Shih-Fu Chang,et al.  AMOS: an active system for MPEG-4 video object segmentation , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[10]  Shih-Fu Chang,et al.  Semantic visual templates: linking visual features to semantics , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[11]  Michael Lindenbaum,et al.  A Generic Grouping Algorithm and Its Quantitative Analysis , 1998, IEEE Trans. Pattern Anal. Mach. Intell..