A Novel Approach to Auto Image Annotation Based on Pairwise Constrained Clustering and Semi-Naïve Bayesian Model

Automatic image annotation has been intensively studied for content-based image retrieval recently. In this paper, we propose a novel approach for this task. Our approach first performs the segmentation of images into regions, followed by the clustering of regions, before learning the associations between concepts and region clusters using the set of training images with pre-assigned concepts. The main focus of this paper and our main contributions are as follows. First, in the learning stage, we perform clustering of regions into region clusters by incorporating pair-wise constraints derived by considering the language model underlying the annotations assigned to training images. Second, in the annotation stage, to alleviate the restriction of the independence assumption between region clusters, we develop a greedy selection and joining algorithm to find the independent sub-sets of region clusters and employ a semi-naïve Bayesian (SNB) model to compute the posterior probability of concepts given those independent sub-sets. Experimental results show that our proposed system utilizing these two strategies outperforms the state-of-the-art techniques in large image collection.

[1]  John R. Smith,et al.  Image Classification and Querying Using Composite Region Templates , 1999, Comput. Vis. Image Underst..

[2]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[4]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5]  Doug Fisher,et al.  Learning from Data: Artificial Intelligence and Statistics V , 1996 .

[6]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[7]  Kerry Rodden How Do People Organise Their Photographs? , 1999, BCS-IRSG Annual Colloquium on IR Research.

[8]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[9]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[10]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[11]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[12]  Clement T. Yu,et al.  Multiple evidence combination in image retrieval: Diogenes searches for people on the Web , 2000, SIGIR '00.

[13]  Tat-Seng Chua,et al.  Building Semantic Perceptron Net for Topic Spotting , 2001, ACL.

[14]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[15]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[16]  Igor Kononenko,et al.  Semi-Naive Bayesian Classifier , 1991, EWSL.

[17]  Jitendra Malik,et al.  Blobworld: A System for Region-Based Image Indexing and Retrieval , 1999, VISUAL.

[18]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[19]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[20]  Chin-Hui Lee,et al.  An Adaptive Image Content Representation and Segmentation Approach to Automatic Image Annotation , 2004, CIVR.

[21]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[22]  James Ze Wang,et al.  Learning-based linguistic indexing of pictures with 2--d MHMMs , 2002, MULTIMEDIA '02.

[23]  Rong Yan,et al.  A Discriminative Learning Framework with Pairwise Constraints for Video Object Classification , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[25]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[26]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.