Collective Reconstructive Embeddings for Cross-Modal Hashing

In this paper, we study the problem of cross-modal retrieval by hashing-based approximate nearest neighbor search techniques. Most existing cross-modal hashing works mainly address the issue of multi-modal integration complexity using the same mapping and similarity calculation for data from different media types. Nonetheless, this may cause information loss during the mapping process due to overlooking the specifics of each individual modality. In this paper, we propose a simple yet effective cross-modal hashing approach, termed collective reconstructive embeddings (CRE), which can simultaneously solve the heterogeneity and integration complexity of multi-modal data. To address the heterogeneity challenge, we propose to process heterogeneous types of data using different modality-specific models. Specifically, we model textual data with cosine similarity-based reconstructive embedding to alleviate the data sparsity to the greatest extent, while for image data, we utilize the Euclidean distance to characterize the relationships of the projected hash codes. Meanwhile, we unify the projections of text and image to the Hamming space into a common reconstructive embedding through rigid mathematical reformulation, which not only reduces the optimization complexity significantly but also facilitates the inter-modal similarity preservation among different modalities. We further incorporate the code balance and uncorrelation criteria into the problem and devise an efficient iterative algorithm for optimization. Comprehensive experiments on four widely used multimodal benchmarks show that the proposed CRE can achieve a superior performance compared with the state of the art on several challenging cross-modal tasks.

[1]  Wei Liu,et al.  Learning to Hash for Indexing Big Data—A Survey , 2015, Proceedings of the IEEE.

[2]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Ling Shao,et al.  Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.

[4]  Xuelong Li,et al.  Robust Discrete Spectral Hashing for Large-Scale Image Semantic Indexing , 2015, IEEE Transactions on Big Data.

[5]  Rongrong Ji,et al.  Cross-Modality Binary Code Learning via Fusion Similarity Hashing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[7]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[8]  Xinbo Gao,et al.  Triplet-Based Deep Hashing Network for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.

[9]  Zi Huang,et al.  Linear cross-modal hashing for efficient multimedia search , 2013, ACM Multimedia.

[10]  Wei Wang,et al.  A Comprehensive Survey on Cross-modal Retrieval , 2016, ArXiv.

[11]  Yi Zhen,et al.  Co-Regularized Hashing for Multimodal Data , 2012, NIPS.

[12]  Wei Liu,et al.  Supervised Discrete Hashing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Wei Liu,et al.  Learning Binary Codes for Maximum Inner Product Search , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[15]  Xinbo Gao,et al.  Multimodal Discriminative Binary Embedding for Large-Scale Cross-Modal Retrieval , 2016, IEEE Transactions on Image Processing.

[16]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Hiroyuki Arai,et al.  Alternating Co-Quantization for Cross-Modal Hashing , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Kien A. Hua,et al.  Linear Subspace Ranking Hashing for Cross-Modal Retrieval , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Wu-Jun Li,et al.  Deep Cross-Modal Hashing , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[21]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Philip S. Yu,et al.  Deep Visual-Semantic Hashing for Cross-Modal Retrieval , 2016, KDD.

[23]  Quan Wang,et al.  Robust and Flexible Discrete Hashing for Cross-Modal Similarity Search , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Xuelong Li,et al.  Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval , 2017, IEEE Transactions on Image Processing.

[25]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Kristen Grauman,et al.  Kernelized Locality-Sensitive Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Ling Shao,et al.  Hetero-Manifold Regularisation for Cross-Modal Hashing , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Heng Tao Shen,et al.  Unsupervised Deep Hashing with Similarity-Adaptive and Discrete Optimization , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[31]  Yi Zhen,et al.  A probabilistic model for multimodal hash function learning , 2012, KDD.

[32]  Heng Tao Shen,et al.  Hierarchical Multi-Clue Modelling for POI Popularity Prediction with Heterogeneous Tourist Information , 2019, IEEE Transactions on Knowledge and Data Engineering.

[33]  Jun Li,et al.  Progressive Generative Hashing for Image Retrieval , 2018, IJCAI.

[34]  Jonghyun Choi,et al.  Predictable Dual-View Hashing , 2013, ICML.

[35]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[36]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[37]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[38]  Huanbo Luan,et al.  Discrete Collaborative Filtering , 2016, SIGIR.

[39]  Xianglong Liu,et al.  Distributed Adaptive Binary Quantization for Fast Nearest Neighbor Search , 2017, IEEE Transactions on Image Processing.

[40]  Lin Wu,et al.  LBMCH: Learning Bridging Mapping for Cross-modal Hashing , 2015, SIGIR.

[41]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[42]  Xinbo Gao,et al.  Semantic Topic Multimodal Hashing for Cross-Media Retrieval , 2015, IJCAI.

[43]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[44]  Chao Li,et al.  Deep Joint Semantic-Embedding Hashing , 2018, IJCAI.

[45]  Xuelong Li,et al.  Deep Binary Reconstruction for Cross-Modal Hashing , 2017, IEEE Transactions on Multimedia.

[46]  Trevor Darrell,et al.  Learning to Hash with Binary Reconstructive Embeddings , 2009, NIPS.

[47]  Wen Gao,et al.  Parametric Local Multimodal Hashing for Cross-View Similarity Search , 2013, IJCAI.

[48]  Yizhou Wang,et al.  Quantized Correlation Hashing for Fast Cross-Modal Search , 2015, IJCAI.

[49]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[50]  Xuelong Li,et al.  Describing Video With Attention-Based Bidirectional LSTM , 2019, IEEE Transactions on Cybernetics.

[51]  Xinbo Gao,et al.  Label Consistent Matrix Factorization Hashing for Large-Scale Cross-Modal Similarity Search , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Roger Levy,et al.  On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Jun Wang,et al.  Self-taught hashing for fast similarity search , 2010, SIGIR.

[54]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Liang Wang,et al.  Cross-Modal Subspace Learning via Pairwise Constraints , 2014, IEEE Transactions on Image Processing.

[56]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[57]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Jürgen Schmidhuber,et al.  Multimodal Similarity-Preserving Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Heng Tao Shen,et al.  Hashing with Angular Reconstructive Embeddings , 2018, IEEE Transactions on Image Processing.

[60]  Wenwu Zhu,et al.  Deep Multimodal Hashing with Orthogonal Regularization , 2015, IJCAI.

[61]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[62]  Wei Liu,et al.  Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval , 2017, AAAI.

[63]  Xuelong Li,et al.  Robust Web Image Annotation via Exploring Multi-Facet and Structural Knowledge , 2017, IEEE Transactions on Image Processing.

[64]  Sanjiv Kumar,et al.  Angular Quantization-based Binary Codes for Fast Similarity Search , 2012, NIPS.

[65]  Wei Liu,et al.  Discrete Graph Hashing , 2014, NIPS.

[66]  Ling Shao,et al.  Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual Cross Retrieval , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[67]  Yang Yang,et al.  Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.

[68]  Ling Shao,et al.  Sequential Discrete Hashing for Scalable Cross-Modality Similarity Retrieval , 2017, IEEE Transactions on Image Processing.