Fast and robust spatial matching for object retrieval

Spatial matching for visual words based object retrieval often involves generating affine transformation hypotheses and then choosing the best hypothesis to measure the spatial consistency. In existing methods, generating an affine transformation hypothesis either requires three correspondences or assumes images are taken in restricted range of viewpoints in using a single correspondence. In this paper, we propose a novel spatial matching method, in which the transformation hypothesis can be estimated from only a single correspondence without the assumption of the viewpoints from which the images are taken. Firstly, affine covariant neighborhoods(ACNs) of features are used to eliminate possible false matches. Secondly, we decompose the affine transformation into three sub-transforms and conquer each sub-transform by exploiting the shape information and the ACNs of a single pair of corresponding features. Experiment results demonstrate that this method improves the average retrieval precision evidently with less computation in comparison with the previous methods.

[1]  Wen Gao,et al.  Effective and efficient object-based image retrieval using visual phrases , 2006, MM '06.

[2]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[3]  Gustavo Carneiro,et al.  Flexible Spatial Configuration of Local Image Features , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  O. Chum,et al.  Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[6]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[7]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[8]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  O. Chum,et al.  ENHANCING RANSAC BY GENERALIZED MODEL OPTIMIZATION Onďrej Chum, Jǐ , 2003 .

[10]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  ZissermanAndrew,et al.  Efficient Visual Search of Videos Cast as Text Retrieval , 2009 .

[13]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[14]  Jiri Matas,et al.  Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, CVPR.

[15]  Andrew Zisserman,et al.  Efficient Visual Search of Videos Cast as Text Retrieval , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.