论文信息 - Mining on Heterogeneous Manifolds for Zero-Shot Cross-Modal Image Retrieval

Mining on Heterogeneous Manifolds for Zero-Shot Cross-Modal Image Retrieval

Most recent approaches for the zero-shot cross-modal image retrieval map images from different modalities into a uniform feature space to exploit their relevance by using a pre-trained model. Based on the observation that manifolds of zero-shot images are usually deformed and incomplete, we argue that the manifolds of unseen classes are inevitably distorted during the training of a two-stream model that simply maps images from different modalities into a uniform space. This issue directly leads to poor cross-modal retrieval performance. We propose a bi-directional random walk scheme to mining more reliable relationships between images by traversing heterogeneous manifolds in the feature space of each modality. Our proposed method benefits from intra-modal distributions to alleviate the interference caused by noisy similarities in the cross-modal feature space. As a result, we achieved great improvement in the performance of the thermal v.s. visible image retrieval task. The code of this paper: https://github.com/fyang93/cross-modal-retrieval

[1] Jie Li,et al. HSME: Hypersphere Manifold Embedding for Visible Thermal Person Re-Identification , 2019, AAAI.

[2] Yannis Avrithis,et al. Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Philip S. Yu,et al. Deep Visual-Semantic Hashing for Cross-Modal Retrieval , 2016, KDD.

[4] Shin'ichi Satoh,et al. Efficient Image Retrieval via Decoupling Diffusion into Online and Offline Processing , 2018, AAAI.

[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[7] Bernhard Schölkopf,et al. Ranking on Data Manifolds , 2003, NIPS.

[8] Pong C. Yuen,et al. Hierarchical Discriminative Learning for Visible Thermal Person Re-Identification , 2018, AAAI.

[9] Horst Bischof,et al. Diffusion Processes for Retrieval Revisited , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Zheng Wang,et al. Visible Thermal Person Re-Identification via Dual-Constrained Top-Ranking , 2018, IJCAI.

[11] Jian-Huang Lai,et al. RGB-Infrared Cross-Modality Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12] Rongrong Ji,et al. Cross-Modality Person Re-Identification with Generative Adversarial Training , 2018, IJCAI.

[13] Yang Yang,et al. Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.

[14] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[15] Arun Ross,et al. Relating ROC and CMC curves via the biometric menagerie , 2013, 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[16] Michael Isard,et al. Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17] Qi Tian,et al. Regularized Diffusion Process for Visual Retrieval , 2017, AAAI.

[18] Shin'ichi Satoh,et al. Cascaded SR-GAN for Scale-Adaptive Low Resolution Person Re-identification , 2018, IJCAI.

[19] Lucas Beyer,et al. In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[20] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[21] Tien Dat Nguyen,et al. Person Recognition System Based on a Combination of Body Images from Visible Light and Thermal Cameras , 2017, Sensors.

[22] Yung-Yu Chuang,et al. Learning to Reduce Dual-Level Discrepancy for Infrared-Visible Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Yannis Avrithis,et al. Mining on Manifolds: Metric Learning Without Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24] Jianmin Wang,et al. Collective Deep Quantization for Efficient Cross-Modal Retrieval , 2017, AAAI.

[25] Qi Tian,et al. Ensemble Diffusion for Retrieval , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.