Deep Progressive Hashing for Image Retrieval

Hashing is a widely adopted method based on an approximate nearest neighbor search and is used in large-scale image retrieval tasks. Conventional learning-based hashing algorithms employ end-to-end representation learning, which is a one-off technique. Because of the tradeoff between efficiency and performance, conventional learning-based hashing methods must sacrifice code length to improve performance, which increases their computational complexity. To improve the efficiency of binary codes, motivated by the “nonsalient-to-salient” attention scheme of humans, we propose a recursive hashing mechanism that maps progressively expanded salient regions to a series of binary codes. These salient regions are generated by a conventional saliency model based on bottom-up saliency-driven attention and a semantic-guided saliency model based on top-down task-driven attention. After obtaining a series of salient regions, we perform long-range temporal modeling of salient regions using a graph-based recurrent deep network to obtain more refined representative features. The later output nodes inherit aggregated information from all previous nodes and extract discriminative features from more salient regions. Therefore, this network possesses more significant information and satisfactory scalability. The proposed recursive hashing neural network, optimized by a triplet ranking loss, is end-to-end trainable. Extensive experimental results from several image retrieval benchmarks show the scalability of our method and demonstrate its strong performance compared with state-of-the-art methods.

[1]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[3]  Wei Liu,et al.  Learning Hash Codes with Listwise Supervision , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[5]  Yan Pan,et al.  Object-Location-Aware Hashing for Multi-Label Image Retrieval via Automatic Mask Learning , 2018, IEEE Transactions on Image Processing.

[6]  Luis Herranz,et al.  Region annotations in hashing based image retrieval , 2014, ICIMCS '14.

[7]  Wen Gao,et al.  Supervised Distributed Hashing for Large-Scale Multimedia Retrieval , 2018, IEEE Transactions on Multimedia.

[8]  Jianmin Wang,et al.  Deep Hashing Network for Efficient Similarity Retrieval , 2016, AAAI.

[9]  Hui Zhang,et al.  Localized Content-Based Image Retrieval , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[11]  Heng Tao Shen,et al.  Semi-Paired Discrete Hashing: Learning Latent Hash Codes for Semi-Paired Cross-View Retrieval , 2017, IEEE Transactions on Cybernetics.

[12]  Rujie Liu,et al.  Multi-graph multi-instance learning with soft label consistency for object-based image retrieval , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[13]  Frank Hutter,et al.  Online Batch Selection for Faster Training of Neural Networks , 2015, ArXiv.

[14]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[15]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[16]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[17]  Guosheng Lin,et al.  Learning Hash Functions Using Column Generation , 2013, ICML.

[18]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[20]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[22]  Geoffrey E. Hinton,et al.  Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[23]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[24]  Wen Gao,et al.  Weighted Component Hashing of Binary Aggregated Descriptors for Fast Visual Search , 2015, IEEE Transactions on Multimedia.

[25]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[26]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[27]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[28]  Bahjat Safadi,et al.  Using semantic context for multiple concepts detection in still images , 2018, Pattern Analysis and Applications.

[29]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  David Suter,et al.  Fast Supervised Hashing with Decision Trees for High-Dimensional Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Abhinav Gupta,et al.  Videos as Space-Time Region Graphs , 2018, ECCV.

[33]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Anton van den Hengel,et al.  Bridging Category-level and Instance-level Semantic Image Segmentation , 2016, ArXiv.

[36]  Dan Zhang,et al.  Learning to Hash with Partial Tags: Exploring Correlation between Tags and Hashing Bits for Large Scale Image Retrieval , 2014, ECCV.

[37]  Georges Quénot,et al.  Two-layers re-ranking approach based on contextual information for visual concepts detection in videos , 2012, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI).

[38]  Sanjiv Kumar,et al.  Learning Binary Codes for High-Dimensional Data Using Bilinear Projections , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Ling Shao,et al.  Multiview Alignment Hashing for Efficient Image Search , 2015, IEEE Transactions on Image Processing.

[40]  Bo Han,et al.  TouchCut: Fast image and video segmentation using single-touch interaction , 2014, Comput. Vis. Image Underst..

[41]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jianmin Wang,et al.  Deep Quantization Network for Efficient Image Retrieval , 2016, AAAI.

[43]  Chao Ma,et al.  Supervised Recurrent Hashing for Large Scale Video Retrieval , 2016, ACM Multimedia.

[44]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[45]  Meng Wang,et al.  Stochastic Multiview Hashing for Large-Scale Near-Duplicate Video Retrieval , 2017, IEEE Transactions on Multimedia.

[46]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[47]  Yizhou Wang,et al.  Neighborhood-Preserving Hashing for Large-Scale Cross-Modal Search , 2016, ACM Multimedia.

[48]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[49]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[50]  Thomas A. Funkhouser,et al.  Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[52]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Jiwen Lu,et al.  Supervised Discriminative Hashing for Compact Binary Codes , 2014, ACM Multimedia.

[54]  Jingkuan Song,et al.  Binary Generative Adversarial Networks for Image Retrieval , 2017, AAAI.

[55]  Hanjiang Lai,et al.  Simultaneous feature learning and hash coding with deep neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Philip S. Yu,et al.  HashNet: Deep Learning to Hash by Continuation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[57]  Shiming Xiang,et al.  Cross-Modal Hashing via Rank-Order Preserving , 2017, IEEE Transactions on Multimedia.

[58]  Qingquan Li,et al.  Instance Similarity Deep Hashing for Multi-Label Image Retrieval , 2018, ArXiv.

[59]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Trevor Darrell,et al.  Learning to Hash with Binary Reconstructive Embeddings , 2009, NIPS.

[61]  David J. Fleet,et al.  Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.

[62]  Zhou Yu,et al.  Sparse Multi-Modal Hashing , 2014, IEEE Transactions on Multimedia.

[63]  J. Theeuwes,et al.  On the time course of top-down and bottom-up control of visual attention , 2000 .

[64]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[65]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[66]  Jianping Fan,et al.  Deep Multiple Instance Hashing for Object-based Image Retrieval , 2017, IJCAI.

[67]  Weiwei Liu,et al.  Multilabel Prediction via Cross-View Search , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[68]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[69]  Patrick P. K. Chan,et al.  Asymmetric Cyclical Hashing for Large Scale Image Retrieval , 2015, IEEE Transactions on Multimedia.

[70]  Kristen Grauman,et al.  Kernelized Locality-Sensitive Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Tieniu Tan,et al.  Deep semantic ranking based hashing for multi-label image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[73]  Shiguang Shan,et al.  Deep Supervised Hashing for Fast Image Retrieval , 2016, International Journal of Computer Vision.

[74]  David J. Fleet,et al.  Hamming Distance Metric Learning , 2012, NIPS.

[75]  Shih-Fu Chang,et al.  Hash Bit Selection: A Unified Solution for Selection Problems in Hashing , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[76]  Xuelong Li,et al.  Graph PCA Hashing for Similarity Search , 2017, IEEE Transactions on Multimedia.

[77]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[78]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.