暂无分享,去创建一个
[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[2] Jianfeng Gao,et al. Unified Vision-Language Pre-Training for Image Captioning and VQA , 2020, AAAI.
[3] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Heng Tao Shen,et al. Cross-Modal Attention With Semantic Consistence for Image–Text Matching , 2020, IEEE Transactions on Neural Networks and Learning Systems.
[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[7] Xin Huang,et al. An Overview of Cross-Media Retrieval: Concepts, Methodologies, Benchmarks, and Challenges , 2017, IEEE Transactions on Circuits and Systems for Video Technology.
[8] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.
[9] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[10] Yuxin Peng,et al. Fine-Grained Visual-Textual Representation Learning , 2020, IEEE Transactions on Circuits and Systems for Video Technology.
[11] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[12] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.
[13] Gang Wang,et al. Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[14] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[15] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[16] Xiu-Shen Wei,et al. Multi-Label Image Recognition With Graph Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[18] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[19] Xilin Chen,et al. Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[20] Yang Yang,et al. Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking , 2019, ACM Multimedia.
[21] Zhizhong Han,et al. CMPD: Using Cross Memory Network With Pair Discrimination for Image-Text Retrieval , 2020, IEEE Transactions on Circuits and Systems for Video Technology.
[22] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.
[23] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[24] Takashi Matsubara,et al. Target-Oriented Deformation of Visual-Semantic Embedding Space , 2019, IEICE Trans. Inf. Syst..
[25] Huifang Ma,et al. Combining Global and Local Similarity for Cross-Media Retrieval , 2020, IEEE Access.
[26] Xin Huang,et al. Visual-Textual Hybrid Sequence Matching for Joint Reasoning. , 2020, IEEE transactions on cybernetics.
[27] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[28] Yuxin Peng,et al. Unsupervised Cross-Media Retrieval Using Domain Adaptation With Scene Graph , 2020, IEEE Transactions on Circuits and Systems for Video Technology.
[29] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[30] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[31] Wei Liu,et al. Matching Image and Sentence With Multi-Faceted Representations , 2020, IEEE Transactions on Circuits and Systems for Video Technology.
[32] Yejin Choi,et al. Neural Motifs: Scene Graph Parsing with Global Context , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[33] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Yun Fu,et al. Visual Semantic Reasoning for Image-Text Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[35] Xueming Qian,et al. Position Focused Attention Network for Image-Text Matching , 2019, IJCAI.
[36] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[37] Yuxin Peng,et al. CM-GANs , 2019, ACM Trans. Multim. Comput. Commun. Appl..
[38] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[39] Yin Li,et al. Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[41] Li Fei-Fei,et al. Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval , 2015, VL@EMNLP.
[42] Zhoujun Li,et al. Bi-Directional Spatial-Semantic Attention Networks for Image-Text Matching , 2019, IEEE Transactions on Image Processing.
[43] Pietro Liò,et al. Graph Attention Networks , 2017, ICLR.
[44] John Shawe-Taylor,et al. Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.
[45] Jung-Woo Ha,et al. Dual Attention Networks for Multimodal Reasoning and Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Serge J. Belongie,et al. Learning to Evaluate Image Captioning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[47] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[48] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[49] Lin Ma,et al. Bidirectional image-sentence retrieval by local and global deep matching , 2019, Neurocomputing.
[50] Jeff A. Bilmes,et al. Deep Canonical Correlation Analysis , 2013, ICML.
[51] Yu Cheng,et al. Relation-Aware Graph Attention Network for Visual Question Answering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[52] Yuxin Peng,et al. Cross-media Multi-level Alignment with Relation Attention Network , 2018, IJCAI.
[53] Jungong Han,et al. Saliency-Guided Attention Network for Image-Sentence Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[54] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[55] Yan Yan,et al. Multi-Level Visual-Semantic Alignments with Relation-Wise Dual Attention Network for Image and Text Matching , 2019, IJCAI.
[56] Yang Yang,et al. Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.
[57] Xi Chen,et al. Stacked Cross Attention for Image-Text Matching , 2018, ECCV.