Image annotation with tagprop on the MIRFLICKR set

Image annotation is an important computer vision problem where the goal is to determine the relevance of annotation terms for images. Image annotation has two main applications: (i) proposing a list of relevant terms to users that want to assign indexing terms to images, and (ii) supporting keyword based search for images without indexing terms, using the relevance estimates to rank images. In this paper we present TagProp, a weighted nearest neighbour model that predicts the term relevance of images by taking a weighted sum of the annotations of the visually most similar images in an annotated training set. TagProp can use a collection of distance measures capturing different aspects of image content, such as local shape descriptors, and global colour histograms. It automatically finds the optimal combination of distances to define the visual neighbours of images that are most useful for annotation prediction. TagProp compensates for the varying frequencies of annotation terms using a term-specific sigmoid to scale the weighted nearest neighbour tag predictions. We evaluate different variants of TagProp with experiments on the MIR Flickr set, and compare with an approach that learns a separate SVM classifier for each annotation term. We also consider using Flickr tags to train our models, both as additional features and as training labels. We find the SVMs to work better when learning from the manual annotations, but TagProp to work better when learning from the Flickr tags. We also find that using the Flickr tags as a feature can significantly improve the performance of SVMs learned from manual annotations.

[1]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[2]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[3]  Raimondo Schettini,et al.  Image annotation using SVM , 2003, IS&T/SPIE Electronic Imaging.

[4]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.

[5]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[6]  Christos Faloutsos,et al.  Automatic multimedia cross-modal correlation discovery , 2004, KDD.

[7]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[8]  Daphna Weinshall,et al.  Learning distance functions for image retrieval , 2004, CVPR 2004.

[9]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, CVPR Workshops.

[10]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[13]  Cordelia Schmid,et al.  Coloring Local Feature Extraction , 2006, ECCV.

[14]  David Grangier,et al.  A Discriminative Kernel-based Model to Rank Images from Text Queries , 2007 .

[15]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  James Ze Wang,et al.  Real-Time Computerized Annotation of Pictures , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Samy Bengio,et al.  A Discriminative Kernel-Based Approach to Rank Images from Text Queries , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yong Wang,et al.  Coherent image annotation by learning semantic distance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Vasant G Honavar,et al.  Annotating images and image objects using a hierarchical dirichlet process model , 2008, MDM '08.

[20]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[21]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[22]  C. Schmid,et al.  INRIA-LEARs participation to ImageCLEF 2009 , 2009 .

[23]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Rong Jin,et al.  Efficient multi-label ranking for multi-class learning: Application to object recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Jing Liu,et al.  Image annotation via graph learning , 2009, Pattern Recognit..

[27]  Cordelia Schmid,et al.  Accurate Image Search Using the Contextual Dissimilarity Measure , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .