A Novel Framework for Image Categorization and Automatic Annotation

Image classification and automatic annotation could be treated as effective solutions to enable keyword-based semantic image retrieval. Traditionally, they are investigated in different models separately. In this chapter, we propose a novel framework that unites image classification and automatic annotation by learning semantic concepts of image categories. In order to choose representative features, a feature selection strategy is proposed, and visual keywords are constructed, including discrete method and continuous method. Based on the selected features, the integrated patch (IP) model is proposed to describe the image category. As a generative model, the IP model describes the appearance of the combination of the visual keywords, considering the diversity of the object. The parameters are estimated by EM algorithm. The experimental results on Corel image dataset and Getty Image Archive demonstrate that the proposed feature selection and image description model are effective in image categorization and automatic image annotation, respectively. 701 E. Chocolate Avenue, Suite 200, Hershey PA 17033-1240, USA Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.irm-press.com ITB13854 IRM PRESS This chapter appears in the book, Semantic-Based Visual Information Retrieval by Y.J. Zhang © 2007, Idea Group Inc. A Novel Framework for Image Categorization and Automatic Annotation Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. Introduction Although content-based image retrieval (CBIR) has been studied well over decades, it is still a challenging problem to search images from a large-scale image database because of the well-acknowledged semantic gap between low-level features and high-level semantic concepts. An alternative solution is to use keyword-based approaches, which usually associate images with keywords either by manually labeling or automatically extracting surrounding text. Although such a solution is adopted widely by most existing commercial image search engines, it is not perfect. First, manual annotation, though precise, is expensive and difficult to extend to large-scale databases. Second, automatically extracted surrounding text might be incomplete and ambiguous in describing images, and even more, surrounding text may not be available in some applications. To overcome these problems, automated image classification and annotation are considered two promising approaches to understanding and describing the content of images. Besides obtaining text annotation, a successful image categorization significantly will enhance the performance of the content-based image retrieval system by filtering out images from irrelevant classes during matching. In this chapter, we propose a novel framework for image classification and automatic annotation. First, feature selection strategy is explicitly introduced, including discrete feature method and continuous feature method. In both methods, the salient patches are detected and quantized for every image category. According to some rules, the informative salient patches are selected. The clusters of the selected salient patches are called the visual keyword dictionary. Second, for each selected patch, a 64-dimensional feature is extracted. Finally, considering the diversity of the same object in different images, the integrated patch (IP) model is proposed. The parameters of the model are estimated by EM algorithm. For a new test image, its posterior probability in each class is calculated. If the label with the largest probability is assigned to it, the classification result is achieved; if multi-words with the largest probability are assigned to it, the annotation result is achieved. The proposed framework, including the feature selection and the probabilistic model can be considered generative and have the potential to be implemented on larger-scale image databases. On the surface, the proposed IP model appears to be similar to some existing models with the GM-mixture model and EM algorithm. However, the topological structure of the IP model is quite different from others. Compared with most of the annotation methods, the IP model regards the annotation problem as a visual categorization with a semantic concept. Compared with some visual categorization methods, the IP model adopts the feature selection to learn a semantic concept to avoid over-complexity. Therefore, through the IP model, the visual categorization and the automatic image annotation can be implemented effectively. The main contributions of this chapter can be highlighted as follows: • First, novel feature selection algorithms are proposed. Taking into account visual quantization, both the discrete and continuous feature methods are capable of selecting the most informative features based on the detected salient patches. • Second, a generative model, the IP model, is proposed to describe image concept. Compared with some discriminative models, such as SVM, the proposed model is 20 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the