Visual stem mapping and Geometric Tense coding for Augmented Visual Vocabulary

This paper addresses the problem of affine distortions caused by viewpoint changes for the application of image retrieval. We study how to expand the visual words from a query image for better retrieval recall without the sacrifice of retrieval precision and efficiency. Our main contribution is the building of visual dictionaries that retain the mapping relationships between visual words extracted from different viewpoints of the same object. Additionally, in each mapping rule we record the affine transformation in which the two visual words are related, as a compact code of viewpoints relationships. By analogizing the concepts of verb stem and verb tense in text, we use Visual Stems to denote visual words extracted from robust local patches, and record the relationships between their affine variants as visual stem mapping rules, including the geometric relationships coded as Geometric Tenses. In this way, our method augments original visual vocabulary with sufficient and accurate expansion information. In query phase, only the objects corresponding to the same visual stems and coherent geometric tense codes will be regarded as similar ones. Moreover, the mapping rules can be learned offline with only one sample for each object. Experiments show that our method can support efficient object retrieval with high recall, requiring little extra time and space cost over traditional visual vocabularies.

[1]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  Ming-Syan Chen,et al.  A New Approach to Image Copy Detection Based on Extended Feature Sets , 2007, IEEE Transactions on Image Processing.

[4]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[6]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[7]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[10]  Yongdong Zhang,et al.  Affine Stable Characteristic based sample expansion for object detection , 2010, CIVR '10.

[11]  Jiri Matas,et al.  Learning a Fine Vocabulary , 2010, ECCV.

[12]  Wen Gao,et al.  PKUBench: A context rich mobile visual search benchmark , 2011, 2011 18th IEEE International Conference on Image Processing.

[13]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Wen Wu,et al.  Object fingerprints for content analysis with applications to street landmark localization , 2008, ACM Multimedia.

[15]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[16]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[17]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.