Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering
暂无分享,去创建一个
[1] Truyen Tran,et al. Hierarchical Conditional Relation Networks for Video Question Answering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Yueting Zhuang,et al. Video Question Answering via Gradually Refined Attention over Appearance and Motion , 2017, ACM Multimedia.
[3] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[4] Chunxiao Liu,et al. Graph Structured Network for Image-Text Matching , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Sanja Fidler,et al. MovieQA: Understanding Stories in Movies through Question-Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Anton van den Hengel,et al. Graph-Structured Representations for Visual Question Answering , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Runhao Zeng,et al. Location-Aware Graph Convolutional Networks for Video Question Answering , 2020, AAAI.
[8] Yun Fu,et al. Visual Semantic Reasoning for Image-Text Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[9] Ramakant Nevatia,et al. Motion-Appearance Co-memory Networks for Video Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[10] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[11] Shu Zhang,et al. Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Yueting Zhuang,et al. Video Question Answering via Hierarchical Spatio-Temporal Attention Networks , 2017, IJCAI.
[13] Zhou Zhao,et al. Multi-interaction Network with Object Relation for Video Question Answering , 2019, ACM Multimedia.
[14] Liwei Wang,et al. Learning Two-Branch Neural Networks for Image-Text Matching Tasks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[15] Xi Chen,et al. Stacked Cross Attention for Image-Text Matching , 2018, ECCV.
[16] Jun Xiao,et al. Multi-Turn Video Question Answering via Multi-Stream Hierarchical Attention Context Network , 2018, IJCAI.
[17] Christopher D. Manning,et al. The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.
[18] Christopher D. Manning,et al. Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.
[19] Yi Yang,et al. Uncovering the Temporal Context for Video Question Answering , 2017, International Journal of Computer Vision.
[20] Licheng Yu,et al. TVQA: Localized, Compositional Video Question Answering , 2018, EMNLP.
[21] Yongdong Zhang,et al. Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching , 2019, ACM Multimedia.
[22] Sridha Sridharan,et al. Hierarchical Relational Attention for Video Question Answering , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).
[23] Jun Yu,et al. Long-Form Video Question Answering via Dynamic Hierarchical Reinforced Networks , 2019, IEEE Transactions on Image Processing.
[24] Chunhua Shen,et al. Visual Question Answering with Memory-Augmented Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[25] Byoung-Tak Zhang,et al. Multimodal Residual Learning for Visual QA , 2016, NIPS.
[26] Xiaogang Wang,et al. Identity-Aware Textual-Visual Matching with Latent Co-attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[27] Junyeong Kim,et al. Progressive Attention Memory Network for Movie Story Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Yutaka Satoh,et al. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[29] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.
[30] Yale Song,et al. TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Byoung-Tak Zhang,et al. Multimodal Dual Attention Memory for Video Story Question Answering , 2018, ECCV.
[32] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Claudio Gentile,et al. Linear Hinge Loss and Average Margin , 1998, NIPS.
[34] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Bo Wang,et al. Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents , 2018, AAAI.
[36] Jung-Woo Ha,et al. Dual Attention Networks for Multimodal Reasoning and Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Yue Gao,et al. Divide and Conquer: Question-Guided Spatio-Temporal Contextual Attention for Video Question Answering , 2020, AAAI.
[38] Yahong Han,et al. Explore Multi-Step Reasoning in Video Question Answering , 2018, CoVieW@MM.
[39] Shizhe Chen,et al. Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Chuang Gan,et al. Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering , 2019, AAAI.
[41] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.
[42] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.
[43] Jingkuan Song,et al. Learnable Aggregating Net with Diversity Learning for Video Question Answering , 2019, ACM Multimedia.