SCRATCH: A Scalable Discrete Matrix Factorization Hashing for Cross-Modal Retrieval

In recent years, many hashing methods have been proposed for the cross-modal retrieval task. However, there are still some issues that need to be further explored. For example, some of them relax the binary constraints to generate the hash codes, which may generate large quantization error. Although some discrete schemes have been proposed, most of them are time-consuming. In addition, most of the existing supervised hashing methods use an n x n similarity matrix during the optimization, making them unscalable. To address these issues, in this paper, we present a novel supervised cross-modal hashing method---Scalable disCRete mATrix faCtorization Hashing, SCRATCH for short. It leverages the collective matrix factorization on the kernelized features and the semantic embedding with labels to find a latent semantic space to preserve the intra- and inter-modality similarities. In addition, it incorporates the label matrix instead of the similarity matrix into the loss function. Based on the proposed loss function and the iterative optimization algorithm, it can learn the hash functions and binary codes simultaneously. Moreover, the binary codes can be generated discretely, reducing the quantization error generated by the relaxation scheme. Its time complexity is linear to the size of the dataset, making it scalable to large-scale datasets. Extensive experiments on three benchmark datasets, namely, Wiki, MIRFlickr-25K, and NUS-WIDE, have verified that our proposed SCRATCH model outperforms several state-of-the-art unsupervised and supervised hashing methods for cross-modal retrieval.

[1]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[2]  Xuelong Li,et al.  Query-Adaptive Reciprocal Hash Tables for Nearest Neighbor Search , 2016, IEEE Transactions on Image Processing.

[3]  P. Schönemann,et al.  A generalized solution of the orthogonal procrustes problem , 1966 .

[4]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[5]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[6]  Liqiang Nie,et al.  Fast Scalable Supervised Hashing , 2018, SIGIR.

[7]  Gang Hua,et al.  Supervised Matrix Factorization for Cross-Modality Hashing , 2016, IJCAI.

[8]  Zhi-Hua Zhou,et al.  Column Sampling Based Discrete Supervised Hashing , 2016, AAAI.

[9]  Xin-Shun Xu Dictionary Learning Based Hashing for Cross-Modal Retrieval , 2016, ACM Multimedia.

[10]  Zi Huang,et al.  Robust Hashing With Local Models for Approximate Similarity Search , 2014, IEEE Transactions on Cybernetics.

[11]  Wei Liu,et al.  Asymmetric Binary Coding for Image Search , 2017, IEEE Transactions on Multimedia.

[12]  Rongrong Ji,et al.  Cross-Modality Binary Code Learning via Fusion Similarity Hashing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Zi Huang,et al.  Effective Multiple Feature Hashing for Large-Scale Near-Duplicate Video Retrieval , 2013, IEEE Transactions on Multimedia.

[14]  Bingbing Ni,et al.  Facilitating Image Search With a Scalable and Compact Semantic Mapping , 2015, IEEE Transactions on Cybernetics.

[15]  Qi Tian,et al.  Linear Distance Preserving Pseudo-Supervised and Unsupervised Hashing , 2016, ACM Multimedia.

[16]  Meng Wang,et al.  Harvesting visual concepts for image search with complex queries , 2012, ACM Multimedia.

[17]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[18]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[19]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[20]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[21]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Meng Wang,et al.  Play and Rewind: Optimizing Binary Representations of Videos by Self-Supervised Temporal Hashing , 2016, ACM Multimedia.

[23]  Yue Gao,et al.  Predicting Microblog Sentiments via Weakly Supervised Multimodal Deep Learning , 2018, IEEE Transactions on Multimedia.

[24]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Nicu Sebe,et al.  Quantization-based hashing: a general framework for scalable image and video retrieval , 2018, Pattern Recognit..

[26]  Jian Sun,et al.  K-Means Hashing: An Affinity-Preserving Quantization Method for Learning Binary Compact Codes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Zi Huang,et al.  Supervised Robust Discrete Multimodal Hashing for Cross-Media Retrieval , 2016, IEEE Transactions on Multimedia.

[28]  Yang Yang,et al.  Zero-Shot Hashing via Transferring Supervised Knowledge , 2016, ACM Multimedia.

[29]  Philip S. Yu,et al.  Deep Visual-Semantic Hashing for Cross-Modal Retrieval , 2016, KDD.

[30]  Xin-Shun Xu,et al.  Scalable Supervised Discrete Hashing for Large-Scale Search , 2018, WWW.

[31]  Meng Wang,et al.  Neighborhood Discriminant Hashing for Large-Scale Image Retrieval , 2015, IEEE Transactions on Image Processing.

[32]  Pengfei Zhang,et al.  Semi-Relaxation Supervised Hashing for Cross-Modal Retrieval , 2017, ACM Multimedia.

[33]  Yang Yang,et al.  Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.

[34]  Hanjiang Lai,et al.  Supervised Hashing for Image Retrieval via Image Representation Learning , 2014, AAAI.

[35]  Liqiang Nie,et al.  Dual Deep Neural Networks Cross-Modal Hashing , 2018, AAAI.

[36]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[37]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Jürgen Schmidhuber,et al.  Multimodal Similarity-Preserving Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[40]  Yongxin Wang,et al.  SDMCH: Supervised Discrete Manifold-Embedded Cross-Modal Hashing , 2018, IJCAI.

[41]  Yi Zhen,et al.  A probabilistic model for multimodal hash function learning , 2012, KDD.

[42]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[43]  Philip S. Yu,et al.  Composite Correlation Quantization for Efficient Multimodal Retrieval , 2015, SIGIR.

[44]  Yue Gao,et al.  Large-Scale Cross-Modality Search via Collective Matrix Factorization Hashing , 2016, IEEE Transactions on Image Processing.

[45]  Xuelong Li,et al.  Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval , 2017, IEEE Transactions on Image Processing.

[46]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Jian Wang,et al.  Linear unsupervised hashing for ANN search in Euclidean space , 2016, Neurocomputing.

[48]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Wu-Jun Li,et al.  Deep Cross-Modal Hashing , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Guiguang Ding,et al.  Large-Scale Cross-Modality Search via Collective Matrix Factorization Hashing. , 2016, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[51]  Rui Yang,et al.  Discrete Multi-view Hashing for Effective Image Retrieval , 2017, ICMR.