A Neural Network to Retrieve Images from Text Queries

This work presents a neural network for the retrieval of images from text queries. The proposed network is composed of two main modules: the first one extracts a global picture representation from local block descriptors while the second one aims at solving the retrieval problem from the extracted representation. Both modules are trained jointly to minimize a loss related to the retrieval performance. This approach is shown to be advantageous when compared to previous models relying on unsupervised feature extraction: average precision over Corel queries reaches 26.2% for our model, which should be compared to 21.6% for PAMIR, the best alternative.

[1]  Samy Bengio,et al.  A Discriminative Approach for the Retrieval of Images from Text Queries , 2006, ECML.

[2]  Samy Bengio,et al.  Exploiting Hyperlinks to Learn a Retrieval Model , 2005, NIPS 2005.

[3]  Paul A. Viola,et al.  Boosting Image Retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4]  Luc Van Gool,et al.  Modeling scenes with local descriptors and latent aspects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[5]  Christophe Garcia,et al.  Convolutional face finder: a neural architecture for fast and robust face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Matti Pietikäinen,et al.  Block-Based Methods for Image Retrieval Using Local Binary Patterns , 2005, SCIA.

[7]  Hanqing Lu,et al.  A practical SVM-based algorithm for ordinal regression in image retrieval , 2003, MULTIMEDIA '03.

[8]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[9]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[10]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[11]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[12]  R. Manmatha,et al.  Automatic Image Annotation and Retrieval using CrossMedia Relevance Models , 2003 .

[13]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[14]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[15]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[16]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.