暂无分享,去创建一个
Marie-Francine Moens | Luc Van Gool | Tinne Tuytelaars | Dengxin Dai | Ted Zhang | T. Tuytelaars | L. Gool | Dengxin Dai | Marie-Francine Moens | Ted Zhang
[1] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.
[2] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[3] Vibhav Vineet,et al. ImageSpirit: Verbal Guided Image Parsing , 2013, ACM Trans. Graph..
[4] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[5] Jürgen Schmidhuber,et al. An Application of Recurrent Neural Networks to Discriminative Keyword Spotting , 2007, ICANN.
[6] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[7] Jung-Woo Ha,et al. Dual Attention Networks for Multimodal Reasoning and Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Jiasen Lu,et al. VQA: Visual Question Answering , 2015, ICCV.
[9] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[10] Rohini K. Srihari,et al. Show&Tell: A Semi-Automated Image Annotation System , 2000, IEEE Multim..
[11] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[12] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Kate Saenko,et al. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.
[14] Jürgen Schmidhuber,et al. Deep Networks with Internal Selective Attention through Feedback Connections , 2014, NIPS.
[15] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[16] Timothy J. Hazen,et al. Speech-based annotation and retrieval of digital photographs , 2007, INTERSPEECH.
[17] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[18] Mario Fritz,et al. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.
[19] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[20] Justin Salamon,et al. A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.
[21] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.
[22] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[23] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[24] Yuandong Tian,et al. Simple Baseline for Visual Question Answering , 2015, ArXiv.
[25] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[26] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Jie Xu,et al. A Semantics-Based Approach for Speech Annotation of Images , 2011, IEEE Transactions on Knowledge and Data Engineering.
[28] Jeffrey P. Bigham,et al. VizWiz: nearly real-time answers to visual questions , 2010, W4A.
[29] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[30] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[31] James R. Glass,et al. Unsupervised Learning of Spoken Language with Visual Context , 2016, NIPS.
[32] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Anthony Hoogs,et al. Video content annotation using visual analysis and a large semantic knowledgebase , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..