论文信息 - Rotation and translation covariant match kernels for image retrieval

Rotation and translation covariant match kernels for image retrieval

We propose a geometric aware aggregated representation for image retrieval.The covariant property is offered by jointly encoding angle/location and SIFT.Efficient matching for multiple transformations via a trigonometric polynomial. Most image encodings achieve orientation invariance by aligning the patches to their dominant orientations and translation invariance by completely ignoring patch position or by max-pooling. Albeit successful, such choices introduce too much invariance because they do not guarantee that the patches are rotated or translated consistently. In this paper, we propose a geometric-aware aggregation strategy, which jointly encodes the local descriptors together with their patch dominant angle or location. The geometric attributes are encoded in a continuous manner by leveraging explicit feature maps. Our technique is compatible with generic match kernel formulation and can be employed along with several popular encoding methods, in particular Bag-of-Words, VLAD and the Fisher vector. The method is further combined with an efficient monomial embedding to provide a codebook-free method aggregating local descriptors into a single vector representation. Invariance is achieved by efficient similarity estimation of multiple rotations or translations, offered by a simple trigonometric polynomial. This strategy is effective for image search, as shown by experiments performed on standard benchmarks for image and particular object retrieval, namely Holidays and Oxford buildings.

[1] Dieter Fox,et al. Object recognition with hierarchical kernel descriptors , 2011, CVPR 2011.

[2] Yannis Avrithis,et al. To Aggregate or Not to aggregate: Selective Match Kernels for Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[3] Naila Murray,et al. Revisiting the Fisher vector for fine-grained classification , 2014, Pattern Recognit. Lett..

[4] Frédéric Jurie,et al. Modeling spatial layout with fisher vectors for image categorization , 2011, 2011 International Conference on Computer Vision.

[5] Cristian Sminchisescu,et al. Fourier Kernel Learning , 2012, ECCV.

[6] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[7] Florent Perronnin,et al. Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9] Thomas Mensink,et al. Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[10] David Haussler,et al. Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[11] Cordelia Schmid,et al. Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[12] Andrew Zisserman,et al. Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[14] Jiri Matas,et al. Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[15] K. Mikolajczyk,et al. Higher-order Occurrence Pooling on Mid- and Low-level Features: Visual Concept Detection , 2013 .

[16] Siwei Lyu,et al. Mercer kernels for object recognition with local features , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17] D. Owen. Handbook of Mathematical Functions with Formulas , 1965 .

[18] Cordelia Schmid,et al. Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Hervé Jégou,et al. Visual query expansion with or without geometry: Refining local descriptors by feature aggregation , 2014, Pattern Recognit..

[20] Yihong Gong,et al. Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21] Hervé Jégou,et al. Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[22] Michael Isard,et al. Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23] Florent Perronnin,et al. Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24] Masatoshi Okutomi,et al. Visual Place Recognition with Repetitive Structures , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Rui Caseiro,et al. High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Cordelia Schmid,et al. Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[27] Yannis Avrithis,et al. Speeded-up, relaxed spatial matching , 2011, 2011 International Conference on Computer Vision.

[28] C. Schmid,et al. On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29] David Picard,et al. Efficient image signatures and similarities using tensor products of local descriptors , 2013, Comput. Vis. Image Underst..

[30] M. Abramowitz,et al. Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[31] Andrew Zisserman,et al. Triangulation Embedding and Democratic Aggregation for Image Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Matthijs Douze,et al. The Yael Library , 2014, ACM Multimedia.

[33] J. L. Harrison,et al. The Government Printing Office , 1968, American Journal of Pharmaceutical Education.

[34] Jiri Matas,et al. Learning a Fine Vocabulary , 2010, ECCV.

[35] Luc Van Gool,et al. Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[36] Matti Pietikäinen,et al. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[37] Andrew Zisserman,et al. Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.

[39] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40] Dieter Fox,et al. Kernel Descriptors for Visual Recognition , 2010, NIPS.

[41] Cristian Sminchisescu,et al. Efficient Match Kernel between Sets of Features for Visual Recognition , 2009, NIPS.

[42] Krystian Mikolajczyk,et al. Spatial Coordinate Coding to reduce histogram representations, Dominant Angle and Colour Pyramid Match , 2011, 2011 18th IEEE International Conference on Image Processing.

[43] Cristian Sminchisescu,et al. Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[44] Tsuhan Chen,et al. Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[45] Krystian Mikolajczyk,et al. Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection , 2013, Comput. Vis. Image Underst..

[46] David G. Lowe,et al. Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[47] Patrick Pérez,et al. Revisiting the VLAD image representation , 2013, ACM Multimedia.

[48] Georges Quénot,et al. Descriptor optimization for multimedia indexing and retrieval , 2013, Multimedia Tools and Applications.

[49] Binoy Pinto,et al. Speeded Up Robust Features , 2011 .

[50] Cordelia Schmid,et al. Evaluation of GIST descriptors for web-scale image search , 2009, CIVR '09.

[51] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[52] Michael Isard,et al. Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[53] Cordelia Schmid,et al. A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[54] Florent Perronnin,et al. Modeling the spatial layout of images beyond spatial pyramids , 2012, Pattern Recognit. Lett..

[55] Guillaume Gravier,et al. Oriented pooling for dense and non-dense rotation-invariant features , 2013, BMVC.

[56] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[57] Hervé Jégou,et al. Orientation Covariant Aggregation of Local Descriptors with Embeddings , 2014, ECCV.

[58] Andrew Zisserman,et al. The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[59] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[60] Andrew Zisserman,et al. All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[61] Christopher Hunt,et al. Notes on the OpenSURF Library , 2009 .