Video Question Answering via Hierarchical Spatio-Temporal Attention Networks
暂无分享,去创建一个
Yueting Zhuang | Deng Cai | Xiaofei He | Zhou Zhao | Qifan Yang | Xiaofei He | Yueting Zhuang | Deng Cai | Zhou Zhao | Qifan Yang
[1] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[2] Yi Yang,et al. Uncovering Temporal Context for Video Question and Answering , 2015, ArXiv.
[3] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[4] Licheng Yu,et al. Visual Madlibs: Fill in the Blank Description Generation and Question Answering , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[5] Mario Fritz,et al. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.
[6] Kewei Tu,et al. Joint Video and Text Parsing for Understanding Events and Answering Queries , 2013, IEEE MultiMedia.
[7] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.
[8] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[9] Byoung-Tak Zhang,et al. Multimodal Residual Learning for Visual QA , 2016, NIPS.
[10] Mubarak Shah,et al. Video Fill in the Blank with Merging LSTMs , 2016, ArXiv.
[11] Wilfred Ng,et al. Expert Finding for Question Answering via Graph Regularized Matrix Completion , 2015, IEEE Transactions on Knowledge and Data Engineering.
[12] Yueting Zhuang,et al. Expert Finding for Community-Based Question Answering via Ranking Metric Network Learning , 2016, IJCAI.
[13] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[14] Richard Socher,et al. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.
[15] Qi Wu,et al. Visual question answering: A survey of methods and datasets , 2016, Comput. Vis. Image Underst..
[16] Saurabh Singh,et al. Where to Look: Focus Regions for Visual Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[18] Yale Song,et al. TGIF: A New Dataset and Benchmark on Animated GIF Description , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[20] Basura Fernando,et al. Learning End-to-end Video Classification with Rank-Pooling , 2016, ICML.
[21] Sanja Fidler,et al. MovieQA: Understanding Stories in Movies through Question-Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Peter Kulchyski. and , 2015 .
[23] Kai Yu,et al. Very deep convolutional neural networks for LVCSR , 2015, INTERSPEECH.
[24] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[25] Jiaya Jia,et al. Visual Question Answering with Question Representation Update (QRU) , 2016, NIPS.
[26] James Pustejovsky. Proceedings of the 32nd annual meeting on Association for Computational Linguistics , 1994 .
[27] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[29] Martha Palmer,et al. Verb Semantics and Lexical Selection , 1994, ACL.
[30] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[31] Noah A. Smith,et al. Good Question! Statistical Ranking for Question Generation , 2010, NAACL.