Cross-modal Semantic Enhanced Interaction for Image-Sentence Retrieval
暂无分享,去创建一个
Fuhai Chen | J. Jose | Fuxiang Tao | Xuri Ge | Songpei Xu
[1] I. Ounis,et al. Multi-modal Graph Contrastive Learning for Micro-video Recommendation , 2022, SIGIR.
[2] Yongdong Zhang,et al. Negative-Aware Attention Framework for Image-Text Matching , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Fei Wen,et al. Cross-modal Graph Matching Network for Image-text Retrieval , 2022, ACM Trans. Multim. Comput. Commun. Appl..
[4] Y. Fu,et al. Image-Text Embedding Learning via Visual and Textual Semantic Reasoning , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[5] Xiaojun Wan,et al. GraDual: Graph-based Dual-modal Representation for Image-Text Matching , 2022, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
[6] Joemon M. Jose,et al. Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval , 2021, ACM Multimedia.
[7] Liqiang Nie,et al. Dynamic Modality Interaction Modeling for Image-Text Retrieval , 2021, SIGIR.
[8] Zhong Ji,et al. Step-Wise Hierarchical Alignment Network for Image-Text Matching , 2021, IJCAI.
[9] Cathal Gurrin,et al. A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval , 2021, SoMeT.
[10] Huchuan Lu,et al. Similarity Reasoning and Filtration for Image-Text Matching , 2021, AAAI.
[11] Yuning Jiang,et al. Learning the Best Pooling Strategy for Visual Semantic Embedding , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Qingrong Cheng,et al. Learning Dual Semantic Relations With Graph Attention for Image-Text Matching , 2020, IEEE Transactions on Circuits and Systems for Video Technology.
[13] Liqiang Nie,et al. Context-Aware Multi-View Summarization Network for Image-Text Matching , 2020, ACM Multimedia.
[14] Zhong Ji,et al. Consensus-Aware Visual-Semantic Embedding for Image-Text Matching , 2020, ECCV.
[15] Qi Zhang,et al. Context-Aware Attention Network for Image-Text Retrieval , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Rodrigo C. Barros,et al. Adaptive Cross-Modal Embeddings for Image-Text Alignment , 2020, AAAI.
[17] Chunxiao Liu,et al. Graph Structured Network for Image-Text Matching , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Ji Liu,et al. IMRAM: Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-Text Retrieval , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Qingming Huang,et al. Learning Fragment Self-Attention Embeddings for Image-Text Matching , 2019, ACM Multimedia.
[20] Xilin Chen,et al. Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[21] Yongdong Zhang,et al. Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching , 2019, ACM Multimedia.
[22] Yun Fu,et al. Visual Semantic Reasoning for Image-Text Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[23] Dezhong Peng,et al. Deep Supervised Cross-Modal Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Jungong Han,et al. Saliency-Guided Attention Network for Image-Sentence Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[25] Xi Chen,et al. Stacked Cross Attention for Image-Text Matching , 2018, ECCV.
[26] Yan Huang,et al. Learning Semantic Concepts and Order for Image and Sentence Matching , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[27] Yejin Choi,et al. Neural Motifs: Scene Graph Parsing with Global Context , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[28] Zhedong Zheng,et al. Dual-path Convolutional Image-Text Embeddings with Instance Loss , 2017, ACM Trans. Multim. Comput. Commun. Appl..
[29] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[30] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.
[31] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[32] Liwei Wang,et al. Learning Two-Branch Neural Networks for Image-Text Matching Tasks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[33] Tao Mei,et al. Boosting Image Captioning with Attributes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[34] Jung-Woo Ha,et al. Dual Attention Networks for Multimodal Reasoning and Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Qi Wu,et al. Image Captioning and Visual Question Answering Based on Attributes and External Knowledge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[36] Yin Li,et al. Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Sanja Fidler,et al. Order-Embeddings of Images and Language , 2015, ICLR.
[38] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[39] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[40] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[41] Wei Xu,et al. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.
[42] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[45] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.
[46] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[47] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[48] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[49] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[50] Rongrong Ji,et al. Variational Structured Semantic Inference for Diverse Image Captioning , 2019, NeurIPS.