Deep Learning for Image Retrieval: What Works and What Doesn't

To build an industrial content-based image retrieval system (CBIRs), it is highly recommended that feature extraction, feature processing and feature indexing need to be fully considered. Although research that bloomed in the past years suggest that the convolutional neural network (CNN) be in a leading position on feature extraction & representation for CBIRs, there are less instructions on the deep analysis of feature related topics, for example the kind of feature representation that has the best performance among the candidates provided by CNN, the extracted features generalization ability, the relationship between the dimensional reduction and the accuracy loss in CBIRs, the best distance measure technique in CBIRs and the benefit of the coding techniques in improving the efficiency of CBIRs, etc. Therefore, several practicing studies were conducted and a thorough analysis was made in this research attempting to answer the above questions. The results in the study on both ImageNet-2012 and an industrial dataset provided by Sogou demonstrate that fc4096a and fc4096b perform the best on the datasets from unseen categories. Several interesting and practicing conclusions are drawn, for instance, fc4096a and fc4096b are found to have a better generalization ability than other features of CNN and could be considered as the first choice for industrial CBIRs. Furthermore, a novel feature binarization approach is presented in this paper for better efficiency of CBIRs. More specifically, the binarization is capable of reducing 31/32 space usage of original data. To sum up, the conclusions seem to provide practical instructions on real industrial CBIRs.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[3]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Shumeet Baluja,et al.  Advances in Neural Information Processing , 1994 .

[7]  Geoffrey E. Hinton,et al.  Using very deep autoencoders for content-based image retrieval , 2011, ESANN.

[8]  Jorma Laaksonen,et al.  Convolutional Network Features for Scene Recognition , 2014, ACM Multimedia.

[9]  Xiaogang Wang,et al.  Deeply learned face representations are sparse, selective, and robust , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[11]  Junqing Yu,et al.  High-dimensional indexing technologies for large scale content-based image retrieval: a review , 2013, Journal of Zhejiang University SCIENCE C.

[12]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[14]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[15]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[16]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[18]  David J. Fleet,et al.  Hamming Distance Metric Learning , 2012, NIPS.