Multi-level network based on transformer encoder for fine-grained image–text matching
暂无分享,去创建一个
[1] Joemon M. Jose,et al. Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval , 2021, ACM Multimedia.
[2] Liqiang Nie,et al. Dynamic Modality Interaction Modeling for Image-Text Retrieval , 2021, SIGIR.
[3] Houqiang Li,et al. Deep Relation Embedding for Cross-Modal Retrieval , 2020, IEEE Transactions on Image Processing.
[4] Liqiang Nie,et al. Context-Aware Multi-View Summarization Network for Image-Text Matching , 2020, ACM Multimedia.
[5] Andrea Esuli,et al. Fine-Grained Visual Textual Alignment for Cross-Modal Retrieval Using Transformer Encoders , 2020, ACM Trans. Multim. Comput. Commun. Appl..
[6] Yuxin Peng,et al. MAVA: Multi-Level Adaptive Visual-Textual Alignment by Cross-Media Bi-Attention Mechanism , 2019, IEEE Transactions on Image Processing.
[7] Qingming Huang,et al. Learning Fragment Self-Attention Embeddings for Image-Text Matching , 2019, ACM Multimedia.
[8] Yi Li,et al. Learning discriminative representations for semantical crossmodal retrieval , 2018, Multimedia Systems.
[9] Qingming Huang,et al. Learning Semantic Structure-preserved Embeddings for Cross-modal Retrieval , 2018, ACM Multimedia.
[10] Amit K. Roy-Chowdhury,et al. Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval , 2018, ACM Multimedia.
[11] Qi Tian,et al. Multi-Networks Joint Learning for Large-Scale Cross-Modal Retrieval , 2017, ACM Multimedia.
[12] Yang Yang,et al. Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.
[13] Yuxin Peng,et al. CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning , 2017, ArXiv.
[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Ruifan Li,et al. Cross-modal Retrieval with Correspondence Autoencoder , 2014, ACM Multimedia.
[16] Xinhang Song,et al. Relative image similarity learning with contextual information for Internet cross-media retrieval , 2014, Multimedia Systems.
[17] A. Yazıcı,et al. RELIEF-MM: effective modality weighting for multimedia information retrieval , 2014, Multimedia Systems.
[18] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[19] Roger Levy,et al. A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.
[20] John Shawe-Taylor,et al. Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.
[21] Li Fei-Fei,et al. Deep visual-semantic alignments for generating image descriptions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).