Multi-source Multi-level Attention Networks for Visual Question Answering
暂无分享,去创建一个
Tao Mei | Xinmei Tian | Jianlong Fu | Dongfei Yu | Tao Mei | Xinmei Tian | Jianlong Fu | D. Yu
[1] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[2] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[3] Chunhua Shen,et al. What Value Do Explicit High Level Concepts Have in Vision to Language Problems? , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Razvan Pascanu,et al. A simple neural network module for relational reasoning , 2017, NIPS.
[5] Yichen Wei,et al. Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[6] Qi Wu,et al. Image Captioning and Visual Question Answering Based on Attributes and External Knowledge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[7] Kate Saenko,et al. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.
[8] Jürgen Schmidhuber,et al. Multi-dimensional Recurrent Neural Networks , 2007, ICANN.
[9] Wei Xu,et al. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.
[10] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Saurabh Singh,et al. Where to Look: Focus Regions for Visual Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Tao Mei,et al. Multi-level Attention Networks for Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Meng Wang,et al. Multi-View Object Retrieval via Multi-Scale Topic Models , 2016, IEEE Transactions on Image Processing.
[14] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[15] Christopher Kanan,et al. An Analysis of Visual Question Answering Algorithms , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[16] Tao Mei,et al. Jointly Modeling Embedding and Translation to Bridge Video and Language , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Richard S. Zemel,et al. Exploring Models and Data for Image Question Answering , 2015, NIPS.
[18] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[19] Tao Mei,et al. Image Tag Refinement With View-Dependent Concept Representations , 2015, IEEE Transactions on Circuits and Systems for Video Technology.
[20] Jung-Woo Ha,et al. Dual Attention Networks for Multimodal Reasoning and Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Kewei Tu,et al. Structured Attentions for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[22] Xuelong Li,et al. Image Annotation by Multiple-Instance Learning With Discriminative Feature Mapping and Selection , 2014, IEEE Transactions on Cybernetics.
[23] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Shuicheng Yan,et al. A Focused Dynamic Attention Model for Visual Question Answering , 2016, ArXiv.
[25] Dhruv Batra,et al. Analyzing the Behavior of Visual Question Answering Models , 2016, EMNLP.
[26] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Tao Mei,et al. Relaxing from Vocabulary: Robust Weakly-Supervised Deep Learning for Vocabulary-Free Image Tagging , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[28] Tao Mei,et al. Video Captioning with Transferred Semantic Attributes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Christoph Meinel,et al. Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning , 2018, ACM Trans. Multim. Comput. Commun. Appl..
[30] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[31] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.
[33] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[34] Tao Mei,et al. Let Your Photos Talk: Generating Narrative Paragraph for Photo Stream via Bidirectional Attention Recurrent Neural Networks , 2017, AAAI.
[35] Tao Mei,et al. Boosting Image Captioning with Attributes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[36] Tao Mei,et al. Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions , 2018, EMNLP.
[37] Matthieu Cord,et al. MUTAN: Multimodal Tucker Fusion for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[38] Meng Wang,et al. Coherent Semantic-Visual Indexing for Large-Scale Image Retrieval in the Cloud , 2017, IEEE Transactions on Image Processing.
[39] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[40] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[42] Wei Xu,et al. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question , 2015, NIPS.
[43] Liangliang Cao,et al. Focal Visual-Text Attention for Visual Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[44] Dhruv Batra,et al. Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions? , 2016, EMNLP.
[45] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[46] Bohyung Han,et al. Image Question Answering Using Convolutional Neural Network with Dynamic Parameter Prediction , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[49] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Allan Jabri,et al. Revisiting Visual Question Answering Baselines , 2016, ECCV.
[51] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[52] Marcus Rohrbach. Attributes as Semantic Units between Natural Language and Visual Recognition , 2016, ArXiv.
[53] Peng Wang,et al. Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge from External Sources , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Tao Mei,et al. Beyond Object Recognition: Visual Sentiment Analysis with Deep Coupled Adjective and Noun Neural Networks , 2016, IJCAI.