Deep Visual-Semantic Quantization for Efficient Image Retrieval

Compact coding has been widely applied to approximate nearest neighbor search for large-scale image retrieval, due to its computation efficiency and retrieval quality. This paper presents a compact coding solution with a focus on the deep learning to quantization approach, which improves retrieval quality by end-to-end representation learning and compact encoding and has already shown the superior performance over the hashing solutions for similarity retrieval. We propose Deep Visual-Semantic Quantization (DVSQ), which is the first approach to learning deep quantization models from labeled image data as well as the semantic information underlying general text domains. The main contribution lies in jointly learning deep visual-semantic embeddings and visual-semantic quantizers using carefully-designed hybrid networks and well-specified loss functions. DVSQ enables efficient and effective image retrieval by supporting maximum inner-product search, which is computed based on learned codebooks with fast distance table lookup. Comprehensive empirical evidence shows that DVSQ can generate compact binary codes and yield state-of-the-art similarity retrieval performance on standard benchmarks.

[1]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[2]  Jinhui Tang,et al.  Supervised Quantization for Similarity Search , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jianmin Wang,et al.  Collective Deep Quantization for Efficient Cross-Modal Retrieval , 2017, AAAI.

[4]  Sanjiv Kumar,et al.  Learning Binary Codes for High-Dimensional Data Using Bilinear Projections , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jingdong Wang,et al.  Inner Product Similarity Search using Compositional Codes , 2014, ArXiv.

[6]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[7]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Jingdong Wang,et al.  Composite Quantization for Approximate Nearest Neighbor Search , 2014, ICML.

[9]  Jianmin Wang,et al.  Deep Hashing Network for Efficient Similarity Retrieval , 2016, AAAI.

[10]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[11]  Geoffrey E. Hinton,et al.  Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[12]  David J. Fleet,et al.  Fast search in Hamming space with multi-index hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[14]  Jianmin Wang,et al.  Correlation Hashing Network for Efficient Cross-Modal Retrieval , 2016, BMVC.

[15]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[16]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Hanjiang Lai,et al.  Simultaneous feature learning and hash coding with deep neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[21]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Shih-Fu Chang,et al.  Circulant Binary Embedding , 2014, ICML.

[24]  Minyi Guo,et al.  Supervised hashing with latent factor models , 2014, SIGIR.

[25]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[26]  Philip S. Yu,et al.  Composite Correlation Quantization for Efficient Multimodal Retrieval , 2015, SIGIR.

[27]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[28]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[30]  Hanjiang Lai,et al.  Supervised Hashing for Image Retrieval via Image Representation Learning , 2014, AAAI.

[31]  Jiwen Lu,et al.  Deep hashing for compact binary codes learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Philip S. Yu,et al.  Deep Visual-Semantic Hashing for Cross-Modal Retrieval , 2016, KDD.

[33]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[34]  Victor Lempitsky,et al.  Additive Quantization for Extreme Vector Compression , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Trevor Darrell,et al.  Learning to Hash with Binary Reconstructive Embeddings , 2009, NIPS.

[36]  David J. Fleet,et al.  Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.

[37]  Shiguang Shan,et al.  Deep Supervised Hashing for Fast Image Retrieval , 2016, International Journal of Computer Vision.

[38]  David J. Fleet,et al.  Hamming Distance Metric Learning , 2012, NIPS.

[39]  Shih-Fu Chang,et al.  Hash Bit Selection: A Unified Solution for Selection Problems in Hashing , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Heng Tao Shen,et al.  Hashing for Similarity Search: A Survey , 2014, ArXiv.

[41]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[42]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[43]  Jianmin Wang,et al.  Deep Quantization Network for Efficient Image Retrieval , 2016, AAAI.

[44]  Jian Sun,et al.  Optimized Product Quantization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Wei Liu,et al.  Supervised Discrete Hashing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).