SCRATCH: A Scalable Discrete Matrix Factorization Hashing Framework for Cross-Modal Retrieval

In this paper, we present a novel supervised cross-modal hashing framework, namely Scalable disCRete mATrix faCtorization Hashing (SCRATCH). First, it utilizes collective matrix factorization on original features together with label semantic embedding, to learn the latent representations in a shared latent space. Thereafter, it generates binary hash codes based on the latent representations. During optimization, it avoids using a large $n\times n$ similarity matrix and generates hash codes discretely. Besides, based on different objective functions, learning strategy, and features, we further present three models in this framework, i.e., SCRATCH-o, SCRATCH-t, and SCRATCH-d. The first one is a one-step method, learning the hash functions and the binary codes in the same optimization problem. The second is a two-step method, which first generates the binary codes and then learns the hash functions based on the learned hash codes. The third one is a deep version of SCRATCH-t, which utilizes deep neural networks as hash functions. The extensive experiments on two widely used benchmark datasets demonstrate that SCRATCH-o and SCRATCH-t outperform some state-of-the-art shallow hashing methods for cross-modal retrieval. The SCRATCH-d also outperforms some state-of-the-art deep hashing models.

[1]  Heng Tao Shen,et al.  Hashing with Angular Reconstructive Embeddings , 2018, IEEE Transactions on Image Processing.

[2]  Xiangyang Ji,et al.  Learning Intra-Video Difference for Person Re-Identification , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Liqiang Nie,et al.  Fast Scalable Supervised Hashing , 2018, SIGIR.

[4]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[5]  Gang Hua,et al.  Supervised Matrix Factorization for Cross-Modality Hashing , 2016, IJCAI.

[6]  Ling Shao,et al.  Discretely Coding Semantic Rank Orders for Supervised Image Hashing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[8]  Xuelong Li,et al.  Query-Adaptive Reciprocal Hash Tables for Nearest Neighbor Search , 2016, IEEE Transactions on Image Processing.

[9]  Wei Liu,et al.  Discrete Graph Hashing , 2014, NIPS.

[10]  Zhi-Hua Zhou,et al.  Column Sampling Based Discrete Supervised Hashing , 2016, AAAI.

[11]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[12]  P. Schönemann,et al.  A generalized solution of the orthogonal procrustes problem , 1966 .

[13]  Nicu Sebe,et al.  Quantization-based hashing: a general framework for scalable image and video retrieval , 2018, Pattern Recognit..

[14]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Junsong Yuan,et al.  Fried Binary Embedding for High-Dimensional Visual Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Heng Tao Shen,et al.  Unsupervised Deep Hashing with Similarity-Adaptive and Discrete Optimization , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yang Yang,et al.  Zero-Shot Hashing via Transferring Supervised Knowledge , 2016, ACM Multimedia.

[19]  Meng Wang,et al.  Person Re-Identification With Metric Learning Using Privileged Information , 2018, IEEE Transactions on Image Processing.

[20]  Yue Gao,et al.  Large-Scale Cross-Modality Search via Collective Matrix Factorization Hashing , 2016, IEEE Transactions on Image Processing.

[21]  Wei Liu,et al.  Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Xuelong Li,et al.  Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval , 2017, IEEE Transactions on Image Processing.

[23]  Zi Huang,et al.  Effective Multiple Feature Hashing for Large-Scale Near-Duplicate Video Retrieval , 2013, IEEE Transactions on Multimedia.

[24]  Heng Tao Shen,et al.  Video Captioning by Adversarial LSTM , 2018, IEEE Transactions on Image Processing.

[25]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Yueting Zhuang,et al.  Multimodal Deep Embedding via Hierarchical Grounded Compositional Semantics , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Jian Wang,et al.  Linear unsupervised hashing for ANN search in Euclidean space , 2016, Neurocomputing.

[28]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Wu-Jun Li,et al.  Deep Cross-Modal Hashing , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Bo Zhang,et al.  Scalable Discrete Supervised Multimedia Hash Learning With Clustering , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Jinhui Tang,et al.  Weakly Supervised Multimodal Hashing for Scalable Social Image Retrieval , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Xin Luo,et al.  Discrete Hashing With Multiple Supervision , 2019, IEEE Transactions on Image Processing.

[34]  Jiwen Lu,et al.  Deep Hashing via Discrepancy Minimization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[36]  Quan Wang,et al.  Robust and Flexible Discrete Hashing for Cross-Modal Similarity Search , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[37]  Xiaodong Yu,et al.  Learning Bidirectional Temporal Cues for Video-Based Person Re-Identification , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  Philip S. Yu,et al.  Composite Correlation Quantization for Efficient Multimodal Retrieval , 2015, SIGIR.

[39]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[40]  Jiwen Lu,et al.  Deep hashing for compact binary codes learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Yang Yang,et al.  Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.

[42]  Wei Zhang,et al.  SCRATCH: A Scalable Discrete Matrix Factorization Hashing for Cross-Modal Retrieval , 2018, ACM Multimedia.

[43]  Wei Liu,et al.  Supervised Discrete Hashing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[45]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[47]  Prateek Jain,et al.  Fast Similarity Search for Learned Metrics , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Rongrong Ji,et al.  Cross-Modality Binary Code Learning via Fusion Similarity Hashing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Jiwen Lu,et al.  Nonlinear Structural Hashing for Scalable Video Search , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[50]  Jason Gu,et al.  A Feature Descriptor Based on Local Normalized Difference for Real-World Texture Classification , 2018, IEEE Transactions on Multimedia.

[51]  Philip S. Yu,et al.  HashNet: Deep Learning to Hash by Continuation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[52]  Xuelong Li,et al.  Robust Discrete Spectral Hashing for Large-Scale Image Semantic Indexing , 2015, IEEE Transactions on Big Data.

[53]  Meng Jian,et al.  Interactive image retrieval using constraints , 2015, Neurocomputing.

[54]  Wei Liu,et al.  Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval , 2017, AAAI.

[55]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[56]  Xin-Shun Xu,et al.  Scalable Supervised Discrete Hashing for Large-Scale Search , 2018, WWW.

[57]  Huimin Lu,et al.  Unsupervised cross-modal retrieval through adversarial learning , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[58]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[59]  Bingbing Ni,et al.  Facilitating Image Search With a Scalable and Compact Semantic Mapping , 2015, IEEE Transactions on Cybernetics.

[60]  Bingpeng Ma,et al.  Video-Based Pedestrian Re-Identification by Adaptive Spatio-Temporal Appearance Model , 2017, IEEE Transactions on Image Processing.

[61]  Xin Huang,et al.  An Overview of Cross-Media Retrieval: Concepts, Methodologies, Benchmarks, and Challenges , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[62]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[63]  Wei Liu,et al.  Asymmetric Binary Coding for Image Search , 2017, IEEE Transactions on Multimedia.