Beyond Sum and Weighted Aggregation: An Efficient Mixed Aggregation Method with Multiple Weights for Image Search

Image search with local descriptors represents an image usually by embedding and aggregating a set of patch descriptors into a single vector. Standard aggregation operations include sum and weighted aggregations. While showing high efficiency, sum aggregation lacks discriminative power. In contrast, weighted aggregation shows promising retrieval performance but suffers extremely high time cost. In this paper, we present a general mixed aggregation method that unifies sum and weighted aggregation methods. Owing to its general formulation, our method is able to balance the trade-off between quality and speed. Furthermore, we propose to compute multiple weighting coefficients rather than one for each to be aggregated vector by partitioning it into several components. Experimental results demonstrate that, while showing over ten times speedup over baselines, the image search frameworks with our mixed aggregation method achieve the state-of-the-art performance. Inspired by our aggregation method, we also present a new embedding strategy. Different from the existing embedding methods that individually map each descriptor into a single embedded vector, our embedding method maps a group of local descriptors into a single vector, which significantly benefits the aggregation step in terms of speed. As demonstrated by the experiments, the retrieval frameworks with our embedding method are more than fifty times faster than baselines, while maintaining competitive retrieval performance.

[1]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[2]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[3]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  Ngai-Man Cheung,et al.  Embedding Based on Function Approximation for Large Scale Image Search , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Naila Murray,et al.  Generalized Max Pooling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Ngai-Man Cheung,et al.  FAemb: A function approximation-based embedding method for image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[11]  Jianru Xue,et al.  Democratic Diffusion Aggregation for Image Retrieval , 2016, IEEE Transactions on Multimedia.

[12]  Tong Zhang,et al.  Improved Local Coordinate Coding using Local Tangents , 2010, ICML.

[13]  Jianru Xue,et al.  Fast Democratic Aggregation and Query Fusion for Image Search , 2015, ICMR.

[14]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Guillaume Gravier,et al.  Oriented pooling for dense and non-dense rotation-invariant features , 2013, BMVC.

[16]  Hervé Jégou,et al.  Orientation Covariant Aggregation of Local Descriptors with Embeddings , 2014, ECCV.

[17]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Andrew Zisserman,et al.  All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Jiri Matas,et al.  Unsupervised discovery of co-occurrence in sparse high dimensional data , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Andrew Zisserman,et al.  Triangulation Embedding and Democratic Aggregation for Image Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Patrick Pérez,et al.  Revisiting the VLAD image representation , 2013, ACM Multimedia.

[25]  Georges Quénot,et al.  Descriptor optimization for multimedia indexing and retrieval , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[26]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[28]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.