Discrete semantic embedding hashing for scalable cross-modal retrieval

Cross-modal hashing has attracted much attention for cross-modal retrieval and achieved promising performance due to its powerful capacity. Some existing cross-modal hashing methods construct pairwise similarities to represent the relationship of heterogeneous data, which require much computation time and storage space, making them unscalable for large-scale retrieval tasks. In this paper, we propose a novel supervised Discrete Semantic Embedding Hashing (DSEH) for cross-modal retrieval. Specifically, we first learn the common representation of heterogeneous data by embedding the semantic labels into a collective matrix factorization, such that both intra- and inter-modality similarities can be well captured. Then, we learn the hash codes in the discrete space based on the learned common representation via an orthogonal rotation technique. Moreover, we learn the multi-modal hash functions that can efficiently convert out-of-sample instances into unified hash codes. Extensive experimental results on three widely used benchmark databases demonstrate the superiority of the proposed DSEH compared with previous state-of-the-arts.