ViT2Hash: Unsupervised Information-Preserving Hashing

Unsupervised image hashing, which maps images into binary codes without supervision, is a compressor with a high compression rate. Hence, how to preserving meaningful information of the original data is a critical problem. Inspired by the large-scale vision pre-training model, known as ViT, which has shown significant progress for learning visual representations, in this paper, we propose a simple information-preserving compressor to finetune the ViT model for the target unsupervised hashing task. Specifically, from pixels to continuous features, we first propose a featurepreserving module, using the corrupted image as input to reconstruct the original feature from the pre-trained ViT model and the complete image, so that the feature extractor can focus on preserving the meaningful information of original data. Secondly, from continuous features to hash codes, we propose a hashing-preserving module, which aims to keep the semantic information from the pretrained ViT model by using the proposed KullbackLeibler divergence loss. Besides, the quantization loss and the similarity loss are added to minimize the quantization error. Our method is very simple and achieves a significantly higher degree of MAP on three benchmark image datasets.

[1]  Jiwen Lu,et al.  Deep hashing for compact binary codes learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ling Shao,et al.  Unsupervised Binary Representation Learning with Deep Variational Networks , 2019, International Journal of Computer Vision.

[3]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[4]  Kai Han,et al.  Greedy Hash: Towards Fast Optimization for Accurate Hash Coding in CNN , 2018, NeurIPS.

[5]  Nicu Sebe,et al.  Proceedings of the 25th ACM international conference on Multimedia , 2007, MM 2007.

[6]  Ross B. Girshick,et al.  Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Wen Gao,et al.  Proceedings of the 28th ACM International Conference on Multimedia , 2009, MM 2009.

[8]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[9]  Ling Shao,et al.  Auto-Encoding Twin-Bottleneck Hashing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Changyou Chen,et al.  Unsupervised Hashing with Contrastive Information Bottleneck , 2021, IJCAI.

[11]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[12]  Tomasz Trzcinski,et al.  BinGAN: Learning Compact Binary Descriptors with a Regularized GAN , 2018, NeurIPS.

[13]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[14]  Heng Tao Shen,et al.  Unsupervised Deep Hashing with Similarity-Adaptive and Discrete Optimization , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[16]  Philip S. Yu,et al.  HashNet: Deep Learning to Hash by Continuation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Hanjiang Lai,et al.  Supervised Hashing for Image Retrieval via Image Representation Learning , 2014, AAAI.

[18]  C. Scott,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence , 2009 .

[19]  Cordelia Schmid,et al.  ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[21]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Furu Wei,et al.  BEiT: BERT Pre-Training of Image Transformers , 2021, ArXiv.

[24]  Jiwen Lu,et al.  Learning Compact Binary Descriptors with Unsupervised Deep Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Hanqing Lu,et al.  Pseudo Label based Unsupervised Deep Discriminative Hashing for Image Retrieval , 2017, ACM Multimedia.