Fast Covariant VLAD for Image Search

Vector of locally aggregated descriptor (VLAD) is a popular image encoding approach for its simplicity and better scalability over conventional bag-of-visual-word approach. In order to enhance its distinctiveness and geometric invariance, covariant VLAD (CVLAD) is proposed to pool local features based on their dominant orientations/characteristic scales, which leads to a geometric-aware representation. This representation achieves rotation/scale invariance when being associated with circular matching. However, the circular matching induces several times of computation overhead, which makes CVLAD hardly suitable for large-scale retrieval tasks. In this paper, the issue of computation overhead is alleviated by performing the circular matching in CVLAD's frequency domain. In addition, by operating PCA on CVLAD in its frequency domain, much better scalability is achieved than when it is undertaken in the original feature space. Furthermore, the high-dimensional CVLAD subvectors are converted to dozens of very low-dimensional subvectors, which is possible when transforming the feature into its frequency domain. Nearest neighbor search is therefore undertaken on very low-dimensional subspaces, which becomes easily tractable. The effectiveness of our approach is demonstrated in the retrieval scenario on popular benchmarks comprising up to 1 million database images.

[1]  Grigorios Tsoumakas,et al.  A Comprehensive Study Over VLAD and Product Quantization in Large-Scale Image Retrieval , 2014, IEEE Transactions on Multimedia.

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[4]  Krystian Mikolajczyk,et al.  Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection , 2013, Comput. Vis. Image Underst..

[5]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Cordelia Schmid,et al.  Packing bag-of-features , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[8]  Yannis Avrithis,et al.  To Aggregate or Not to aggregate: Selective Match Kernels for Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Patrick Pérez,et al.  Revisiting the VLAD image representation , 2013, ACM Multimedia.

[10]  Mark A. Clements,et al.  Digital Signal Processing and Statistical Classification , 2002 .

[11]  Florent Perronnin,et al.  Modeling the spatial layout of images beyond spatial pyramids , 2012, Pattern Recognit. Lett..

[12]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[13]  Hervé Jégou,et al.  Rotation and translation covariant match kernels for image retrieval , 2015, Comput. Vis. Image Underst..

[14]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[15]  B. S. Manjunath,et al.  Video Annotation Through Search and Graph Reinforcement Mining , 2010, IEEE Transactions on Multimedia.

[16]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[17]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[18]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Sunil Arya,et al.  ANN: library for approximate nearest neighbor searching , 1998 .

[20]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[21]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[22]  Koen E. A. van de Sande,et al.  Codemaps - Segment, Classify and Search Objects Locally , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Atsuto Maki,et al.  Visual Instance Retrieval with Deep Convolutional Networks , 2014, ICLR.

[24]  Yongdong Zhang,et al.  Contextual Query Expansion for Image Retrieval , 2014, IEEE Transactions on Multimedia.

[25]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[26]  Olivier Buisson,et al.  Video and image copy detection demo , 2007, CIVR '07.

[27]  Guillaume Gravier,et al.  Oriented pooling for dense and non-dense rotation-invariant features , 2013, BMVC.

[28]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Christoph H. Lampert,et al.  Efficient Subwindow Search: A Branch and Bound Framework for Object Localization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Hervé Jégou,et al.  Orientation Covariant Aggregation of Local Descriptors with Embeddings , 2014, ECCV.

[32]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Hervé Jégou,et al.  Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[34]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[35]  R. Tolimieri,et al.  Algorithms for Discrete Fourier Transform and Convolution , 1989 .

[36]  Qi Tian,et al.  Fast Image Retrieval: Query Pruning and Early Termination , 2015, IEEE Transactions on Multimedia.

[37]  O. Chum,et al.  Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Chong-Wah Ngo,et al.  On the Annotation of Web Videos by Efficient Near-Duplicate Search , 2010, IEEE Transactions on Multimedia.

[39]  Cordelia Schmid,et al.  Event Retrieval in Large Video Collections with Circulant Temporal Encoding , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[41]  Arnold W. M. Smeulders,et al.  Locality in Generic Instance Search from One Example , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[43]  Yi Li,et al.  ARISTA - image search to annotation on billions of web photos , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Hsuan-Tien Lin,et al.  Unsupervised Semantic Feature Discovery for Image Object Retrieval and Tag Refinement , 2012, IEEE Transactions on Multimedia.

[46]  David Picard,et al.  Web-Scale Image Retrieval Using Compact Tensor Aggregation of Visual Descriptors , 2013, IEEE MultiMedia.

[47]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[48]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[49]  Andrew Zisserman,et al.  All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Krystian Mikolajczyk,et al.  Spatial Coordinate Coding to reduce histogram representations, Dominant Angle and Colour Pyramid Match , 2011, 2011 18th IEEE International Conference on Image Processing.

[51]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[52]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[54]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[55]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.