Fast Semantic Preserving Hashing for Large-Scale Cross-Modal Retrieval

Most Cross-modal hashing methods do not sufficiently exploit the discrimination power of semantic information when learning hash codes, while often involving time-consuming training procedures for large-scale dataset. To tackle these issues, we first formulate the learning of similarity-preserving hash codes in terms of orthogonally rotating the semantic data to hamming space, and then propose a novel Fast Semantic Preserving Hashing (FSePH) approach to large-scale cross-modal retrieval. Specifically, FSePH introduces an orthonormal basis to regress the targeted hash codes of training examples to their corresponding reasonably relaxed class labels, featuring significantly reducing the quantization error. Meanwhile, an effective optimization algorithm is derived for modality-specific projection function learning and an efficient closed-form solution for hash code learning, which are computationally tractable. Extensive experiments have shown that the proposed FSePH approach runs sufficiently fast, and also significantly improves the retrieval performances over the state-of-the-arts.

[1]  Zhenan Sun,et al.  Fast Supervised Discrete Hashing , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Haibin Ling,et al.  Attention guided deep audio-face fusion for efficient speaker naming , 2019, Pattern Recognit..

[4]  Jianmin Wang,et al.  Collective Deep Quantization for Efficient Cross-Modal Retrieval , 2017, AAAI.

[5]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[6]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[7]  Keqiu Li,et al.  Binary Hashing for Approximate Nearest Neighbor Search on Big Data: A Survey , 2018, IEEE Access.

[8]  Fuwei Wang,et al.  Deep Semantic Correlation Learning Based Hashing for Multimedia Cross-Modal Retrieval , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[9]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Philip S. Yu,et al.  Composite Correlation Quantization for Efficient Multimodal Retrieval , 2015, SIGIR.

[11]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[12]  Chao Li,et al.  Shared Predictive Cross-Modal Deep Quantization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[14]  Xuelong Li,et al.  Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval , 2017, IEEE Transactions on Image Processing.

[15]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[17]  Wen Gao,et al.  Cross-pose face recognition based on partial least squares , 2011, Pattern Recognit. Lett..

[18]  Xin Liu,et al.  Fast density peak clustering for large scale data based on kNN , 2020, Knowl. Based Syst..

[19]  Nikhil Rasiwasia,et al.  Cluster Canonical Correlation Analysis , 2014, AISTATS.

[20]  Rongrong Ji,et al.  Cross-Modality Binary Code Learning via Fusion Similarity Hashing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ling Shao,et al.  Supervised Matrix Factorization Hashing for Cross-Modal Retrieval , 2016, IEEE Transactions on Image Processing.

[22]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[23]  Devraj Mandal,et al.  Generalized Semantic Preserving Hashing for N-Label Cross-Modal Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[25]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[26]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Wu-Jun Li,et al.  Deep Cross-Modal Hashing , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  C. V. Jawahar,et al.  Multi-label Cross-Modal Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).