Ask Your Neurons: A Deep Learning Approach to Visual Question Answering
暂无分享,去创建一个
Mario Fritz | Marcus Rohrbach | Mateusz Malinowski | Mario Fritz | Mateusz Malinowski | Marcus Rohrbach
[1] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .
[2] Jacob Cohen,et al. The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .
[3] Martha Palmer,et al. Verb Semantics and Lexical Selection , 1994, ACL.
[4] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[5] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[6] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .
[7] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.
[8] Dan Klein,et al. Accurate Unlexicalized Parsing , 2003, ACL.
[9] P. Kantor. Foundations of Statistical Natural Language Processing , 2001, Information Retrieval.
[10] Dima Damen,et al. Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[11] Dan Klein,et al. Learning Dependency-Based Compositional Semantics , 2011, CL.
[12] Luke S. Zettlemoyer,et al. A Joint Model of Language and Perception for Grounded Attribute Learning , 2012, ICML.
[13] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.
[14] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[15] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.
[16] Bernt Schiele,et al. Grounding Action Descriptions in Videos , 2013, TACL.
[17] Gerhard Weikum,et al. Fine-grained Semantic Typing of Emerging Entities , 2013, ACL.
[18] Lucy Vanderwende,et al. Learning the Visual Interpretation of Sentences , 2013, 2013 IEEE International Conference on Computer Vision.
[19] Luísa Coheur,et al. Learning to answer questions , 2013, ArXiv.
[20] Jayant Krishnamurthy,et al. Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World , 2013, TACL.
[21] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[22] Wojciech Zaremba,et al. Learning to Execute , 2014, ArXiv.
[23] Sanja Fidler,et al. What Are You Talking About? Text-to-Image Coreference , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[24] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[25] Phil Blunsom,et al. A Convolutional Neural Network for Modelling Sentences , 2014, ACL.
[26] Jonathan Berant,et al. Semantic Parsing via Paraphrasing , 2014, ACL.
[27] Mario Fritz,et al. Towards a Visual Turing Challenge , 2014, ArXiv.
[28] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[29] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[30] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.
[31] Mario Fritz,et al. A Pooling Approach to Modelling Spatial Relations for Image Retrieval and Annotation , 2014, ArXiv.
[32] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[33] Mario Fritz,et al. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.
[34] Richard Socher,et al. A Neural Network for Factoid Question Answering over Paragraphs , 2014, EMNLP.
[35] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[36] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[37] Donald Geman,et al. Visual Turing test for computer vision systems , 2015, Proceedings of the National Academy of Sciences.
[38] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[39] Yuandong Tian,et al. Simple Baseline for Visual Question Answering , 2015, ArXiv.
[40] Wei Xu,et al. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question , 2015, NIPS.
[41] Yi Li,et al. Compositional Memory for Visual Question Answering , 2015, ArXiv.
[42] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[43] Jason Weston,et al. Memory Networks , 2014, ICLR.
[44] Jiasen Lu,et al. VQA: Visual Question Answering , 2015, ICCV.
[45] Marcus Rohrbach,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[46] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[47] Yi Yang,et al. Uncovering Temporal Context for Video Question and Answering , 2015, ArXiv.
[48] Bernt Schiele,et al. A dataset for Movie Description , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[50] Wei Xu,et al. ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering , 2015, ArXiv.
[51] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[52] Licheng Yu,et al. Visual Madlibs: Fill in the Blank Description Generation and Question Answering , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[53] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Richard S. Zemel,et al. Image Question Answering: A Visual Semantic Embedding Model and a New Dataset , 2015, ArXiv.
[55] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[57] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[58] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[59] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[60] Mario Fritz,et al. Hard to Cheat: A Turing Test based on Answering Questions about Images , 2015, AAAI 2015.
[61] Trevor Darrell,et al. Grounding of Textual Phrases in Images by Reconstruction , 2015, ECCV.
[62] Saurabh Singh,et al. Where to Look: Focus Regions for Visual Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Sanja Fidler,et al. MovieQA: Understanding Stories in Movies through Question-Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Kate Saenko,et al. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.
[65] Peng Wang,et al. Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge from External Sources , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[66] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[67] Bernt Schiele,et al. Multi-cue Zero-Shot Learning with Strong Supervision , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[68] Jiasen Lu,et al. Hierarchical Co-Attention for Visual Question Answering , 2016 .
[69] Licheng Yu,et al. Modeling Context in Referring Expressions , 2016, ECCV.
[70] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[71] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[72] Bohyung Han,et al. Image Question Answering Using Convolutional Neural Network with Dynamic Parameter Prediction , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[73] Aaditya Prakash,et al. Highway Networks for Visual Question Answering , 2016 .
[74] Richard Socher,et al. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.
[75] Mario Fritz,et al. Tutorial on Answering Questions about Images with Deep Learning , 2016, ArXiv.
[76] Shuicheng Yan,et al. A Focused Dynamic Attention Model for Visual Question Answering , 2016, ArXiv.
[77] Dan Klein,et al. Learning to Compose Neural Networks for Question Answering , 2016, NAACL.
[78] Lin Ma,et al. Learning to Answer Questions from Image Using Convolutional Neural Network , 2015, AAAI.
[79] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[80] Trevor Darrell,et al. Segmentation from Natural Language Expressions , 2016, ECCV.
[81] Yin Li,et al. Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[82] Mario Fritz,et al. Xplore-M-Ego: Contextual Media Retrieval Using Natural Language Queries , 2016, ICMR.
[83] Trevor Darrell,et al. Natural Language Object Retrieval , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[84] Michael S. Bernstein,et al. Visual7W: Grounded Question Answering in Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[85] Alan L. Yuille,et al. Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[86] Christopher Kanan,et al. Answer-Type Prediction for Visual Question Answering , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[87] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.
[88] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[89] Tatsuya Harada,et al. DualNet: Domain-invariant network for visual question answering , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).
[90] Subhashini Venugopalan,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[91] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).