Multi-Level Multimodal Common Semantic Space for Image-Phrase Grounding
暂无分享,去创建一个
Shih-Fu Chang | Surabhi Bhargava | Brian Chen | Carl Vondrick | Svebor Karaman | Hassan Akbari | Shih-Fu Chang | Carl Vondrick | Svebor Karaman | Surabhi Bhargava | Brian Chen | Hassan Akbari
[1] Minh N. Do,et al. Unsupervised Textual Grounding: Linking Words to Image Concepts , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[2] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.
[3] Trevor Darrell,et al. Natural Language Object Retrieval , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[5] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.
[6] Tat-Seng Chua,et al. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Ramakant Nevatia,et al. MSRC: multimodal spatial regression with semantic context for phrase grounding , 2017, International Journal of Multimedia Information Retrieval.
[8] Rada Mihalcea,et al. Structured Matching for Phrase Localization , 2016, ECCV.
[9] Qi Wu,et al. Visual Grounding via Accumulated Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[10] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[11] Ramakant Nevatia,et al. Knowledge Aided Consistency for Weakly Supervised Phrase Grounding , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[12] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[13] Svetlana Lazebnik,et al. Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[14] Larry P. Heck,et al. Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.
[15] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[16] Zhe Gan,et al. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[17] Trevor Darrell,et al. Grounding Visual Explanations , 2018, ECCV.
[18] Vincent Dumoulin,et al. Deconvolution and Checkerboard Artifacts , 2016 .
[19] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[20] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .
[21] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.
[22] Yong Jae Lee,et al. Weakly-Supervised Visual Grounding of Phrases with Linguistic Structures , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Ondrej Chum,et al. CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.
[24] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[25] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[26] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[27] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[28] Kate Saenko,et al. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.
[29] Alan L. Yuille,et al. Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Geoffrey Zweig,et al. Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.
[31] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[33] Ramakant Nevatia,et al. Query-Guided Regression Network with Context Policy for Phrase Grounding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[34] Albert Gordo,et al. Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.
[35] Martin Engilberge,et al. Finding Beans in Burgers: Deep Semantic-Visual Embedding with Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[36] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[37] Oliver Schulte,et al. Image Caption Generation with Hierarchical Contextual Visual Spatial Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[38] Biing-Hwang Juang,et al. Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..
[39] Paul Clough,et al. The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems , 2006 .
[40] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[41] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[42] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[43] Fang Zhao,et al. Weakly Supervised Phrase Localization with Multi-scale Anchored Transformer Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[44] Licheng Yu,et al. MAttNet: Modular Attention Network for Referring Expression Comprehension , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[45] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[46] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Kate Saenko,et al. Top-Down Visual Saliency Guided by Captions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.
[49] Chenxi Liu,et al. Attention Correctness in Neural Image Captioning , 2016, AAAI.
[50] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Hod Lipson,et al. Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.
[52] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Trevor Darrell,et al. Grounding of Textual Phrases in Images by Reconstruction , 2015, ECCV.
[54] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Takayuki Okatani,et al. Improved Fusion of Visual and Language Representations by Dense Symmetric Co-attention for Visual Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[56] Jivko Sinapov,et al. Guiding Interaction Behaviors for Multi-modal Grounded Language Learning , 2017, RoboNLP@ACL.
[57] Zhe L. Lin,et al. Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.
[58] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[59] Vineet Gandhi,et al. Learning Unsupervised Visual Grounding Through Semantic Self-Supervision , 2018, IJCAI.
[60] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[61] Jinjun Xiong,et al. Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts , 2018, NIPS.
[62] Li Fei-Fei,et al. Progressive Neural Architecture Search , 2017, ECCV.
[63] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).