Compact Global Descriptors for Visual Search

The first step in an image retrieval pipeline consists of comparing global descriptors from a large database to find a short list of candidate matching images. The more compact the global descriptor, the faster the descriptors can be compared for matching. State-of-the-art global descriptors based on Fisher Vectors are represented with tens of thousands of floating point numbers. While there is significant work on compression of local descriptors, there is relatively little work on compression of high dimensional Fisher Vectors. We study the problem of global descriptor compression in the context of image retrieval, focusing on extremely compact binary representations: 64-1024 bits. Motivated by the remarkable success of deep neural networks in recent literature, we propose a compression scheme based on deeply stacked Restricted Boltzmann Machines (SRBM), which learn lower dimensional non-linear subspaces on which the data lie. We provide a thorough evaluation of several state-of-the-art compression schemes based on PCA, Locality Sensitive Hashing, Product Quantization and greedy bit selection, and show that the proposed compression scheme outperforms all existing schemes.

[1]  Chuohao Yeo,et al.  Rate-efficient visual correspondences using random projections , 2008, 2008 15th IEEE International Conference on Image Processing.

[2]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[3]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Geoffrey E. Hinton,et al.  3D Object Recognition with Deep Belief Nets , 2009, NIPS.

[8]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[10]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[11]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[12]  Bernd Girod,et al.  Memory-Efficient Image Databases for Mobile Visual Search , 2014, IEEE MultiMedia.

[13]  Sanjiv Kumar,et al.  Learning Binary Codes for High-Dimensional Data Using Bilinear Projections , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Bernd Girod,et al.  Interframe Coding of Feature Descriptors for Mobile Augmented Reality , 2014, IEEE Transactions on Image Processing.

[16]  Bernd Girod,et al.  Compressed Histogram of Gradients: A Low-Bitrate Descriptor , 2011, International Journal of Computer Vision.

[17]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[18]  Kristen Grauman,et al.  Learning Binary Hash Codes for Large-Scale Image Search , 2013, Machine Learning for Computer Vision.

[19]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[20]  Bernd Girod,et al.  Residual enhanced visual vector as a compact signature for mobile visual search , 2013, Signal Process..

[21]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[22]  Wen Gao,et al.  Robust fisher codes for large scale image retrieval , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[24]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[25]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[26]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[27]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Bernd Girod,et al.  Transform coding of image feature descriptors , 2009, Electronic Imaging.