Joint Image and Word Sense Discrimination for Image Retrieval

We study the task of learning to rank images given a text query, a problem that is complicated by the issue of multiple senses. That is, the senses of interest are typically the visually distinct concepts that a user wishes to retrieve. In this paper, we propose to learn a ranking function that optimizes the ranking cost of interest and simultaneously discovers the disambiguated senses of the query that are optimal for the supervised task. Note that no supervised information is given about the senses. Experiments performed on web images and the ImageNet dataset show that using our approach leads to a clear gain in performance.

[1]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[2]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[3]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  Trevor Darrell,et al.  The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[5]  Kobus Barnard,et al.  Word sense disambiguation with pictures , 2003, HLT-NAACL 2003.

[6]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology) , 2006 .

[7]  Ted Pedersen,et al.  Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces , 2004, CoNLL.

[8]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[10]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[11]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[12]  Cheng Soon Ong,et al.  Training and Approximation of a Primal Multiclass Support Vector Machine , 2007 .

[13]  Ted Pedersen,et al.  Distinguishing Word Senses in Untagged Text , 1997, EMNLP.

[14]  Hsin-Hsi Chen,et al.  Image Sense Classification in Text-Based Image Retrieval , 2009, AIRS.

[15]  David A. Forsyth,et al.  Animals on the Web , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Daniel Gatica-Perez,et al.  On image auto-annotation with latent space models , 2003, ACM Multimedia.

[17]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .

[18]  Annalina Caputo,et al.  Exploiting Disambiguation and Discrimination in Information Retrieval Systems , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[19]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[20]  Tie-Yan Liu,et al.  Information Retrieval Technology , 2014, Lecture Notes in Computer Science.

[21]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[22]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[23]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[24]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[25]  Trevor Darrell,et al.  Filtering Abstract Senses From Image Search Results , 2009, NIPS.

[26]  Samy Bengio,et al.  A Discriminative Kernel-Based Approach to Rank Images from Text Queries , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, International Conference on Artificial Neural Networks.

[29]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[30]  David A. Forsyth,et al.  Discriminating Image Senses by Clustering with Multimodal Features , 2006, ACL.

[31]  David Grangier,et al.  A Discriminative Kernel-based Model to Rank Images from Text Queries , 2007 .

[32]  Liang-Tien Chia,et al.  A Latent Model for Visual Disambiguation of Keyword-based Image Search , 2009, BMVC.

[33]  Kristin P. Bennett,et al.  Multiple instance ranking , 2008, ICML '08.

[34]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[35]  Marc Toussaint,et al.  Extracting Motion Primitives from Natural Handwriting Data , 2006, ICANN.

[36]  Francesca Odone,et al.  Histogram intersection kernel for image classification , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[37]  Stephen P. Boyd,et al.  Stochastic Subgradient Methods , 2007 .

[38]  Samy Bengio,et al.  A Neural Network to Retrieve Images from Text Queries , 2006, ICANN.