Mining concise and distinctive affine-stable features for object detection in large corpus

Invariant features extraction is important for object detection. Affine-SIFT (ASIFT) [J.M. Morel and G. Yu, ASIFT: A new framework for fully affine invariant image comparison, SIAM J. Imaging Sci. 2(2) (2009)] has been proved to be fully affine-invariant. However, the high cost of memory and query time hampers its application in large-scale object detection tasks. In this paper, we present a novel algorithm for mining concise and distinctive invariant features called affine-stable characteristics (ASC). Two new notions, global stability and local stability, are introduced to calculate the robustness of each feature from two mutually complementary aspects. Furthermore, to make these stable characteristics more distinctive, spatial information taken from several representative scales is encoded in a concise method. Experiments show that the robustness of our ASC is comparable with ASIFT, while the cost of memory can be reduced significantly to only 5%. Moreover, compared with the traditional SIFT method [D. Lowe, Distinctive image features from scale invariant keypoints, Int. J. Comput. Vis. 60(2) (2004), pp. 91–110], the accuracy of object detection can be improved 38.6% by our ASC using similar amount of features.

[1]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[2]  Wen Wu,et al.  Object fingerprints for content analysis with applications to street landmark localization , 2008, ACM Multimedia.

[3]  C. Schmid,et al.  Hamming Embedding and Weak Geometry Consistency for Large Scale Image Search - extended version , 2008 .

[4]  Harry Shum,et al.  A multi-sample, multi-tree approach to bag-of-words image representation for image retrieval , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[7]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[10]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[11]  J. Morel,et al.  INTRODUCTION 1 On the consistency of the SIFT Method , 2008 .

[12]  Andrew Zisserman,et al.  An Exemplar Model for Learning Object Classes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Shree K. Nayar,et al.  Ordinal Measures for Image Correspondence , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Wolfgang Heidrich,et al.  Cloth Motion Capture , 2003, SIGGRAPH '03.

[16]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[17]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Winston H. Hsu,et al.  Query expansion for hash-based image object retrieval , 2009, ACM Multimedia.

[19]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.