Mining on Heterogeneous Manifolds for Zero-Shot Cross-Modal Image Retrieval

Most recent approaches for the zero-shot cross-modal image retrieval map images from different modalities into a uniform feature space to exploit their relevance by using a pre-trained model. Based on the observation that manifolds of zero-shot images are usually deformed and incomplete, we argue that the manifolds of unseen classes are inevitably distorted during the training of a two-stream model that simply maps images from different modalities into a uniform space. This issue directly leads to poor cross-modal retrieval performance. We propose a bi-directional random walk scheme to mining more reliable relationships between images by traversing heterogeneous manifolds in the feature space of each modality. Our proposed method benefits from intra-modal distributions to alleviate the interference caused by noisy similarities in the cross-modal feature space. As a result, we achieved great improvement in the performance of the thermal v.s. visible image retrieval task. The code of this paper: https://github.com/fyang93/cross-modal-retrieval

[1]  Jie Li,et al.  HSME: Hypersphere Manifold Embedding for Visible Thermal Person Re-Identification , 2019, AAAI.

[2]  Yannis Avrithis,et al.  Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Philip S. Yu,et al.  Deep Visual-Semantic Hashing for Cross-Modal Retrieval , 2016, KDD.

[4]  Shin'ichi Satoh,et al.  Efficient Image Retrieval via Decoupling Diffusion into Online and Offline Processing , 2018, AAAI.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[7]  Bernhard Schölkopf,et al.  Ranking on Data Manifolds , 2003, NIPS.

[8]  Pong C. Yuen,et al.  Hierarchical Discriminative Learning for Visible Thermal Person Re-Identification , 2018, AAAI.

[9]  Horst Bischof,et al.  Diffusion Processes for Retrieval Revisited , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Zheng Wang,et al.  Visible Thermal Person Re-Identification via Dual-Constrained Top-Ranking , 2018, IJCAI.

[11]  Jian-Huang Lai,et al.  RGB-Infrared Cross-Modality Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Rongrong Ji,et al.  Cross-Modality Person Re-Identification with Generative Adversarial Training , 2018, IJCAI.

[13]  Yang Yang,et al.  Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.

[14]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[15]  Arun Ross,et al.  Relating ROC and CMC curves via the biometric menagerie , 2013, 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[16]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Qi Tian,et al.  Regularized Diffusion Process for Visual Retrieval , 2017, AAAI.

[18]  Shin'ichi Satoh,et al.  Cascaded SR-GAN for Scale-Adaptive Low Resolution Person Re-identification , 2018, IJCAI.

[19]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[20]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[21]  Tien Dat Nguyen,et al.  Person Recognition System Based on a Combination of Body Images from Visible Light and Thermal Cameras , 2017, Sensors.

[22]  Yung-Yu Chuang,et al.  Learning to Reduce Dual-Level Discrepancy for Infrared-Visible Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Yannis Avrithis,et al.  Mining on Manifolds: Metric Learning Without Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Jianmin Wang,et al.  Collective Deep Quantization for Efficient Cross-Modal Retrieval , 2017, AAAI.

[25]  Qi Tian,et al.  Ensemble Diffusion for Retrieval , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.