Spatio-temporal Relational Reasoning for Video Question Answering
暂无分享,去创建一个
[1] Ramakant Nevatia,et al. Motion-Appearance Co-memory Networks for Video Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[2] Yueting Zhuang,et al. Video Question Answering via Gradually Refined Attention over Appearance and Motion , 2017, ACM Multimedia.
[3] Cordelia Schmid,et al. Learning to Track for Spatio-Temporal Action Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[4] Yale Song,et al. TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Susanne Westphal,et al. The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[6] Abhinav Gupta,et al. Videos as Space-Time Region Graphs , 2018, ECCV.
[7] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Asim Kadav,et al. Attend and Interact: Higher-Order Object Interactions for Video Understanding , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[9] Bohyung Han,et al. MarioQA: Answering Questions by Watching Gameplay Videos , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[10] Razvan Pascanu,et al. Discovering objects and their relations from entangled scene representations , 2017, ICLR.
[11] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[12] Jongwook Choi,et al. End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Chuang Gan,et al. Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering , 2019, AAAI.
[14] Shih-Fu Chang,et al. Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Yahong Han,et al. Explore Multi-Step Reasoning in Video Question Answering , 2018, CoVieW@MM.
[16] Bernt Schiele,et al. A dataset for Movie Description , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[18] Byoung-Tak Zhang,et al. DeepStory: Video Story QA by Deep Embedded Memory Networks , 2017, IJCAI.
[19] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[20] Razvan Pascanu,et al. A simple neural network module for relational reasoning , 2017, NIPS.
[21] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[22] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.
[23] Richard S. Zemel,et al. Exploring Models and Data for Image Question Answering , 2015, NIPS.
[24] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[26] Gunhee Kim,et al. A Read-Write Memory Network for Movie Story Understanding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[27] Tao Mei,et al. Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).