论文信息 - DeepHash: Getting Regularization, Depth and Fine-Tuning Right

DeepHash: Getting Regularization, Depth and Fine-Tuning Right

This work focuses on representing very high-dimensional global image descriptors using very compact 64-1024 bit binary hashes for instance retrieval. We propose DeepHash: a hashing scheme based on deep networks. Key to making DeepHash work at extremely low bitrates are three important considerations -- regularization, depth and fine-tuning -- each requiring solutions specific to the hashing problem. In-depth evaluation shows that our scheme consistently outperforms state-of-the-art methods across all data sets for both Fisher Vectors and Deep Convolutional Neural Network features, by up to 20 percent over other schemes. The retrieval performance with 256-bit hashes is close to that of the uncompressed floating point features -- a remarkable 512 times compression.

[1] Kristen Grauman,et al. Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2] Geoffrey E. Hinton,et al. Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[3] Christian Szegedy,et al. DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Yann LeCun,et al. Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[5] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[6] Kristen Grauman,et al. Learning Binary Hash Codes for Large-Scale Image Search , 2013, Machine Learning for Computer Vision.

[7] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8] David J. Fleet,et al. Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.

[9] Huizhong Chen,et al. The stanford mobile visual search data set , 2011, MMSys.

[10] Bernd Girod,et al. Memory-Efficient Image Databases for Mobile Visual Search , 2014, IEEE MultiMedia.

[11] Svetlana Lazebnik,et al. Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[12] Chuohao Yeo,et al. Rate-efficient visual correspondences using random projections , 2008, 2008 15th IEEE International Conference on Image Processing.

[13] Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[14] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15] Matthieu Cord,et al. Unsupervised and Supervised Visual Codes with Restricted Boltzmann Machines , 2012, ECCV.

[16] Cordelia Schmid,et al. Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Shih-Fu Chang,et al. Spherical hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19] Victor S. Lempitsky,et al. Neural Codes for Image Retrieval , 2014, ECCV.

[20] Sam S. Tsai,et al. Survey of SIFT Compression Schemes , 2010 .

[21] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[22] Wei Liu,et al. Learning Hash Codes with Listwise Supervision , 2013, 2013 IEEE International Conference on Computer Vision.

[23] Geoffrey E. Hinton,et al. 3D Object Recognition with Deep Belief Nets , 2009, NIPS.

[24] Cordelia Schmid,et al. Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[25] Rongrong Ji,et al. Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28] Anil K. Jain,et al. On-line signature verification, , 2002, Pattern Recognit..

[29] Antonio Torralba,et al. Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Bart Thomee,et al. New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[31] Antonio Torralba,et al. Spectral Hashing , 2008, NIPS.

[32] Bernd Girod,et al. CHoG: Compressed histogram of gradients A low bit-rate feature descriptor , 2009, CVPR.

[33] Honglak Lee,et al. Sparse deep belief net model for visual area V2 , 2007, NIPS.

[34] Xiaogang Wang,et al. Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35] Yann LeCun,et al. Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[36] Sanjiv Kumar,et al. Learning Binary Codes for High-Dimensional Data Using Bilinear Projections , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[38] Bernd Girod,et al. Residual enhanced visual vector as a compact signature for mobile visual search , 2013, Signal Process..

[39] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[41] Jitendra Malik,et al. Analyzing the Performance of Multilayer Neural Networks for Object Recognition , 2014, ECCV.

[42] David Nistér,et al. Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[43] Wen Gao,et al. Robust fisher codes for large scale image retrieval , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[44] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[45] Shih-Fu Chang,et al. Semi-supervised hashing for scalable image retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[46] Florent Perronnin,et al. Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[48] Guosheng Lin,et al. Learning Hash Functions Using Column Generation , 2013, ICML.