Weakly-Supervised Image Hashing through Masked Visual-Semantic Graph-based Reasoning

With the popularization of social websites, many methods have been proposed to explore the noisy tags for weakly-supervised image hashing.The main challenge lies in learning appropriate and sufficient information from those noisy tags. To address this issue, this work proposes a novel Masked visual-semantic Graph-based Reasoning Network, termed as MGRN, to learn joint visual-semantic representations for image hashing. Specifically, for each image, MGRN constructs a relation graph to capture the interactions among its associated tags and performs reasoning with Graph Attention Networks (GAT). MGRN randomly masks out one tag and then make GAT to predict this masked tag. This forces the GAT model to capture the dependence between the image and its associated tags, which can well address the problem of noisy tags. Thus it can capture key tags and visual structures from images to learn well-aligned visual-semantic representations. Finally, the auto-encoders is leveraged to learn hash codes that can preserve the local structure of the joint space. Meanwhile, the joint visual-semantic representations are reconstructed from those hash codes by using a decoder. Experimental results on two widely-used benchmark datasets demonstrate the superiority of the proposed method for image retrieval compared with several state-of-the-art methods.

[1]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Jiebo Luo,et al.  Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[4]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  Thomas S. Huang,et al.  Relevance feedback in image retrieval: A comprehensive review , 2003, Multimedia Systems.

[7]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[8]  Zenghui Wang,et al.  Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review , 2017, Neural Computation.

[9]  Peisong Wang,et al.  K-Nearest Neighbors Hashing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[11]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[12]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[13]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[14]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[15]  Meng Wang,et al.  Neighborhood Discriminant Hashing for Large-Scale Image Retrieval , 2015, IEEE Transactions on Image Processing.

[16]  Kate Saenko,et al.  Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.

[17]  Le Song,et al.  Stochastic Generative Hashing , 2017, ICML.

[18]  Jiwen Lu,et al.  Deep hashing for compact binary codes learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Heng Tao Shen,et al.  Unsupervised Deep Hashing with Similarity-Adaptive and Discrete Optimization , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Ngai-Man Cheung,et al.  Learning to Hash with Binary Deep Neural Network , 2016, ECCV.

[21]  Baoxin Li,et al.  Weakly Supervised Deep Image Hashing Through Tag Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yang Yang,et al.  Graph Convolutional Network Hashing , 2020, IEEE Transactions on Cybernetics.

[23]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[24]  Tao Mei,et al.  Deep Collaborative Embedding for Social Image Understanding , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Wei Liu,et al.  Semantic Structure-based Unsupervised Deep Hashing , 2018, IJCAI.

[26]  Jinhui Tang,et al.  Weakly Supervised Multimodal Hashing for Scalable Social Image Retrieval , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Tat-Seng Chua,et al.  Discrete Image Hashing Using Large Weakly Annotated Photo Collections , 2016, AAAI.

[28]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[29]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Dacheng Tao,et al.  DistillHash: Unsupervised Deep Hashing by Distilling Data Pairs , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Xian-Sheng Hua,et al.  Learning semantic distance from community-tagged media collection , 2009, MM '09.

[32]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[33]  Jian Yang,et al.  Discriminative Deep Quantization Hashing for Face Image Retrieval , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[34]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[35]  Fei Xie,et al.  Tag-based Weakly-supervised Hashing for Image Retrieval , 2018, IJCAI.

[36]  Jiwen Lu,et al.  Learning Compact Binary Descriptors with Unsupervised Deep Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jinhui Tang,et al.  Deep Ordinal Hashing With Spatial Attention , 2018, IEEE Transactions on Image Processing.

[38]  Xuelong Li,et al.  Large Graph Hashing with Spectral Rotation , 2017, AAAI.

[39]  Jinhui Tang,et al.  Deep Semantic-Preserving Ordinal Hashing for Cross-Modal Similarity Search , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[40]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[41]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Baoxin Li,et al.  CLARE: A Joint Approach to Label Classification and Tag Recommendation , 2017, AAAI.

[43]  Sethuraman Panchanathan,et al.  Deep Hashing Network for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jinhui Tang,et al.  Weakly-supervised Semantic Guided Hashing for Social Image Retrieval , 2020, International Journal of Computer Vision.

[45]  Meng Wang,et al.  Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Xianglong Liu,et al.  Graph Convolutional Network Hashing for Cross-Modal Retrieval , 2019, IJCAI.

[47]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[48]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[49]  Xudong Lin,et al.  Unsupervised Rank-Preserving Hashing for Large-Scale Image Retrieval , 2019, ICMR.

[50]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[51]  SchmidhuberJürgen Deep learning in neural networks , 2015 .

[52]  Jiebo Luo,et al.  AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations Rather Than Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Wu-Jun Li,et al.  Scalable Graph Hashing with Feature Transformation , 2015, IJCAI.