Apprentissage de distance pour l'annotation d'images par plus proches voisins

L'annotation automatique d'image est un probleme ouvert important pour la vision par ordinateur. Pour cette tache nous proposons TagProp, un modele par plus proche voisins ponderes. Celui-ci est entraine de maniere discriminative et exploite des images d'apprentissage pour predire les labels des images de test. Les poids sont calcules a partir du rang ou de la distance entre l'image et son voisin. TagProp permet l'optimisation de la distance qui definit les voisinages en maximisant la log-vraisemblance des predictions de l'ensemble d'apprentissage. Ainsi, nous pouvons regler de maniere optimale la combinaison de plusieurs similarites visuelles qui vont des histogrammes globaux de couleur aux descriptions locales de forme. Nous proposons egalement de moduler specifiquement chaque mot pour augmenter le rappel des mots rares. Nous comparons les performances des differentes variantes de notre modele a l'etat de l'art sur trois bases d'images. Sur les cinq mesures considerees, TagProp ameliore significativement l'etat de l'art.

[1]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[2]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[3]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[4]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[5]  Raimondo Schettini,et al.  Image annotation using SVM , 2003, IS&T/SPIE Electronic Imaging.

[6]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.

[7]  R. Manmatha,et al.  An Inference Network Approach to Image Retrieval , 2004, CIVR.

[8]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[9]  Tomer Hertz,et al.  Learning distance functions for image retrieval , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[10]  Christos Faloutsos,et al.  Automatic multimedia cross-modal correlation discovery , 2004, KDD.

[11]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[12]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[13]  Stefan M. Rüger,et al.  Automated Image Annotation Using Global Features and Robust Nonparametric Density Estimation , 2005, CIVR.

[14]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[15]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Cordelia Schmid,et al.  Coloring Local Feature Extraction , 2006, ECCV.

[18]  Michael Grubinger,et al.  Analysis and evaluation of visual information systems performance , 2007 .

[19]  David Grangier,et al.  A Discriminative Kernel-based Model to Rank Images from Text Queries , 2007 .

[20]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  James Ze Wang,et al.  Real-Time Computerized Annotation of Pictures , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Samy Bengio,et al.  A Discriminative Kernel-Based Approach to Rank Images from Text Queries , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Yong Wang,et al.  Coherent image annotation by learning semantic distance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Vasant G Honavar,et al.  Annotating images and image objects using a hierarchical dirichlet process model , 2008, MDM '08.

[25]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[26]  Cordelia Schmid,et al.  Is that you? Metric learning approaches for face identification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27]  Zhi-Hua Zhou,et al.  Learning a distance metric from multi-instance multi-label data , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Jing Liu,et al.  Image annotation via graph learning , 2009, Pattern Recognit..