Compressed sensing based feature fusion for image retrieval

In the standard vector of locally aggregated descriptors (VLAD) model, the residual norm of each descriptor with corresponding center to the VLAD vector varies significantly, thus the visual similarity measure based on norm measurement will be corrupted. To address the problem, a weighted-residual is used in this paper to balance the distribution of residual norms of all descriptors in the VLAD vector to a certain degree. Moreover, to improve retrieval accuracy, local gradient and local color features of images are extracted and used to our proposed weighted VLAD method. Also, the global structural features and multiple weighted VLAD vectors are fused to represent an image. In order to better fuse these different image features, due to the advantage of collecting key information of original signal by compressed sensing (CS), it is used as a converter to transform the various features into a same feature subspace. Our proposed method is evaluated on three benchmark datasets, i.e., Holidays, Ukbench and Oxford5k. Experimental results show that our proposed method achieves good performances compared with other methods.

[1]  Grigorios Tsoumakas,et al.  A Comprehensive Study Over VLAD and Product Quantization in Large-Scale Image Retrieval , 2014, IEEE Transactions on Multimedia.

[2]  Andrew Zisserman,et al.  Efficient Visual Search of Videos Cast as Text Retrieval , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Cordelia Schmid,et al.  Learning Color Names for Real-World Applications , 2009, IEEE Transactions on Image Processing.

[4]  Peng Fan,et al.  Color-SURF: A surf descriptor with local kernel color histograms , 2009, 2009 IEEE International Conference on Network Infrastructure and Digital Content.

[5]  Patrick Pérez,et al.  Revisiting the VLAD image representation , 2013, ACM Multimedia.

[6]  Zhiguo Jiang,et al.  SIFT-based Elastic sparse coding for image retrieval , 2012, ICIP.

[7]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[8]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[9]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Fahad Shahbaz Khan,et al.  The Impact of Color on Bag-of-Words Based Object Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[11]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Xi Wang,et al.  A novel multi-feature fusion and sparse coding-based framework for image retrieval , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[13]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[14]  Qi Tian,et al.  SIFT Meets CNN: A Decade Survey of Instance Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[16]  Hervé Jégou,et al.  Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[17]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  James Ze Wang,et al.  Content-based image retrieval: approaches and trends of the new age , 2005, MIR '05.

[20]  R.G. Baraniuk,et al.  Compressive Sensing [Lecture Notes] , 2007, IEEE Signal Processing Magazine.

[21]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[23]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  Yi-Gang Cen,et al.  SURF binarization and fast codebook construction for image retrieval , 2017, J. Vis. Commun. Image Represent..

[26]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Andrew Zisserman,et al.  Triangulation Embedding and Democratic Aggregation for Image Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[30]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[31]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[32]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Hengyou Wang,et al.  Separable vocabulary and feature fusion for image retrieval based on sparse representation , 2017, Neurocomputing.

[34]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[35]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.