Hierarchical Conditional Relation Networks for Video Question Answering
暂无分享,去创建一个
Truyen Tran | Svetha Venkatesh | Vuong Le | Thao Minh Le | S. Venkatesh | T. Tran | Vuong Le | T. Le
[1] Ramakant Nevatia,et al. Motion-Appearance Co-memory Networks for Video Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[2] Truyen Tran,et al. Neural Reasoning, Fast and Slow, for Video Question Answering , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).
[3] Svetha Venkatesh,et al. Learning to Reason with Relational Video Representation for Question Answering , 2019, ArXiv.
[4] Yann LeCun,et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[5] Jun Xiao,et al. Multi-Turn Video Question Answering via Multi-Stream Hierarchical Attention Context Network , 2018, IJCAI.
[6] Liangliang Cao,et al. Focal Visual-Text Attention for Visual Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[7] Shu Zhang,et al. Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Xiao Liu,et al. Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding , 2017, ArXiv.
[9] Jingkuan Song,et al. Learnable Aggregating Net with Diversity Learning for Video Question Answering , 2019, ACM Multimedia.
[10] Sanja Fidler,et al. MovieQA: Understanding Stories in Movies through Question-Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[12] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Chuang Gan,et al. Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering , 2019, AAAI.
[14] Zhou Zhao,et al. Multi-interaction Network with Object Relation for Video Question Answering , 2019, ACM Multimedia.
[15] Yale Song,et al. TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Yutaka Satoh,et al. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[17] Licheng Yu,et al. TVQA: Localized, Compositional Video Question Answering , 2018, EMNLP.
[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] James J. Little,et al. Spatio-temporal Relational Reasoning for Video Question Answering , 2019, BMVC.
[20] Junyeong Kim,et al. Progressive Attention Memory Network for Movie Story Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] S. Venkatesh,et al. Learning Deep Matrix Representations , 2017 .
[22] Yi Yang,et al. Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Chuan-Sheng Foo,et al. Holistic Multi-Modal Memory Network for Movie Question Answering , 2018, IEEE Transactions on Image Processing.
[24] Juan Carlos Niebles,et al. Leveraging Video Descriptions to Learn Video Question Answering , 2016, AAAI.
[25] Yueting Zhuang,et al. Video Question Answering via Hierarchical Spatio-Temporal Attention Networks , 2017, IJCAI.
[26] Bo Wang,et al. Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents , 2018, AAAI.
[27] Jun Yu,et al. Long-Form Video Question Answering via Dynamic Hierarchical Reinforced Networks , 2019, IEEE Transactions on Image Processing.
[28] Yueting Zhuang,et al. Video Question Answering via Gradually Refined Attention over Appearance and Motion , 2017, ACM Multimedia.
[29] Gunhee Kim,et al. A Read-Write Memory Network for Movie Story Understanding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[30] Tao Mei,et al. Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[31] Byoung-Tak Zhang,et al. Bilinear Attention Networks , 2018, NeurIPS.
[32] Byoung-Tak Zhang,et al. Multimodal Dual Attention Memory for Video Story Question Answering , 2018, ECCV.
[33] Rita Cucchiara,et al. Hierarchical Boundary-Aware Neural Encoder for Video Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Aaron C. Courville,et al. FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.
[36] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[37] Byoung-Tak Zhang,et al. DeepStory: Video Story QA by Deep Embedded Memory Networks , 2017, IJCAI.
[38] Rainer Lienhart,et al. Abstracting home video automatically , 1999, MULTIMEDIA '99.
[39] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[40] Kaiming He,et al. Long-Term Feature Banks for Detailed Video Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Yahong Han,et al. Explore Multi-Step Reasoning in Video Question Answering , 2018, CoVieW@MM.
[42] Long Chen,et al. Video Question Answering via Attribute-Augmented Attention Network Learning , 2017, SIGIR.
[43] Sridha Sridharan,et al. Hierarchical Relational Attention for Video Question Answering , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).
[44] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[46] Xing Zhang,et al. Non-local NetVLAD Encoding for Video Classification , 2018, ECCV Workshops.
[47] Feng Mao,et al. Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network , 2018, ECCV Workshops.
[48] Yi Yang,et al. Uncovering the Temporal Context for Video Question Answering , 2017, International Journal of Computer Vision.
[49] Meng Wang,et al. Question-Aware Tube-Switch Network for Video Question Answering , 2019, ACM Multimedia.