Image Retrieval with Structured Object Queries Using Latent Ranking SVM

We consider image retrieval with structured object queries --- queries that specify the objects that should be present in the scene, and their spatial relations. An example of such queries is "car on the road". Existing image retrieval systems typically consider queries consisting of object classes (i.e. keywords). They train a separate classifier for each object class and combine the output heuristically. In contrast, we develop a learning framework to jointly consider object classes and their relations. Our method considers not only the objects in the query ("car" and "road" in the above example), but also related object categories can be useful for retrieval. Since we do not have ground-truth labeling of object bounding boxes on the test image, we represent them as latent variables in our model. Our learning method is an extension of the ranking SVM with latent variables, which we call latent ranking SVM. We demonstrate image retrieval and ranking results on a dataset with more than a hundred of object classes.

[1]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[2]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[3]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[4]  O. Chapelle Large margin optimization of ranking measures , 2007 .

[5]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[7]  Tsuhan Chen,et al.  From appearance to context-based recognition: Dense labeling in small images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[10]  Charless C. Fowlkes,et al.  Discriminative models for multi-class object layout , 2009, ICCV.

[11]  Shi-Min Hu,et al.  Sketch2Photo: internet image montage , 2009, ACM Trans. Graph..

[12]  Larry S. Davis,et al.  Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, CVPR.

[13]  Matthew B. Blaschko,et al.  Simultaneous Object Detection and Ranking with Weak Supervision , 2010, NIPS.

[14]  Antonio Torralba,et al.  Exploiting hierarchical context on a large database of object categories , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Thomas Deselaers,et al.  ClassCut for Unsupervised Class Segmentation , 2010, ECCV.

[16]  Cyrus Rashtchian,et al.  Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.

[17]  Yejin Choi,et al.  Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.

[18]  Larry S. Davis,et al.  Image ranking and retrieval based on multi-attribute queries , 2011, CVPR 2011.

[19]  Ali Farhadi,et al.  Recognition using visual phrases , 2011, CVPR 2011.

[20]  Yejin Choi,et al.  Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.