Graph-Based Multi-Interaction Network for Video Question Answering
暂无分享,去创建一个
Richang Hong | Zhou Zhao | Fei Wu | Weike Jin | Mao Gu | Fei Wu | Richang Hong | Zhou Zhao | Mao Gu | Weike Jin
[1] Yale Song,et al. TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Yueting Zhuang,et al. Video Question Answering via Gradually Refined Attention over Appearance and Motion , 2017, ACM Multimedia.
[3] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[4] Joshua B. Tenenbaum,et al. Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense , 2020, Engineering.
[5] Tao Mei,et al. Multi-level Attention Networks for Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Anton van den Hengel,et al. Graph-Structured Representations for Visual Question Answering , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Ramakant Nevatia,et al. Motion-Appearance Co-memory Networks for Video Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[8] Mario Fritz,et al. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.
[9] Juan Carlos Niebles,et al. Leveraging Video Descriptions to Learn Video Question Answering , 2016, AAAI.
[10] Zhou Zhao,et al. Open-Ended Video Question Answering via Multi-Modal Conditional Adversarial Networks , 2020, IEEE Transactions on Image Processing.
[11] Jun Xiao,et al. Multi-Turn Video Question Answering via Multi-Stream Hierarchical Attention Context Network , 2018, IJCAI.
[12] Nicholas Jing Yuan,et al. Multi-Turn Video Question Generation via Reinforced Multi-Choice Attention Network , 2021, IEEE Transactions on Circuits and Systems for Video Technology.
[13] Khalil Sima'an,et al. Graph Convolutional Encoders for Syntax-aware Neural Machine Translation , 2017, EMNLP.
[14] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Silvio Savarese,et al. Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Joan Bruna,et al. Few-Shot Learning with Graph Neural Networks , 2017, ICLR.
[17] Yichen Wei,et al. Learning Region Features for Object Detection , 2018, ECCV.
[18] Yueting Zhuang,et al. Temporality-enhanced knowledgememory network for factoid question answering , 2018, Frontiers of Information Technology & Electronic Engineering.
[19] Yueting Zhuang,et al. Disambiguating named entities with deep supervised learning via crowd labels , 2017, Frontiers of Information Technology & Electronic Engineering.
[20] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[21] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[22] Richard S. Zemel,et al. Exploring Models and Data for Image Question Answering , 2015, NIPS.
[23] Xuelong Li,et al. The Next Breakthroughs of Artificial Intelligence: The Interdisciplinary Nature of AI , 2020 .
[24] Zijian Zhang,et al. Moment Retrieval via Cross-Modal Interaction Networks With Query Reconstruction , 2020, IEEE Transactions on Image Processing.
[25] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[26] Abhinav Gupta,et al. Videos as Space-Time Region Graphs , 2018, ECCV.
[27] Sanja Fidler,et al. MovieQA: Understanding Stories in Movies through Question-Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Meng Wang,et al. Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[29] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[30] Yuxin Peng,et al. Object-Aware Aggregation With Bidirectional Temporal Graph for Video Captioning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Jinhui Tang,et al. Weakly-Shared Deep Transfer Networks for Heterogeneous-Domain Knowledge Propagation , 2015, ACM Multimedia.
[32] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[33] Jun Yu,et al. Long-Form Video Question Answering via Dynamic Hierarchical Reinforced Networks , 2019, IEEE Transactions on Image Processing.
[34] Deng Cai,et al. Multi-Turn Video Question Answering via Hierarchical Attention Context Reinforced Networks , 2019, IEEE Transactions on Image Processing.
[35] Zhou Zhao,et al. Multi-interaction Network with Object Relation for Video Question Answering , 2019, ACM Multimedia.
[36] Judea Pearl,et al. Causal Inference , 2010 .
[37] Danfei Xu,et al. Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Asim Kadav,et al. Attend and Interact: Higher-Order Object Interactions for Video Understanding , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[39] Richard Socher,et al. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.
[40] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[41] Abhinav Gupta,et al. Temporal Dynamic Graph LSTM for Action-Driven Video Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[42] Yahong Han,et al. Explore Multi-Step Reasoning in Video Question Answering , 2018, CoVieW@MM.
[43] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.
[44] Yi Yang,et al. Uncovering the Temporal Context for Video Question Answering , 2017, International Journal of Computer Vision.
[45] Yueting Zhuang,et al. Video Question Answering via Knowledge-based Progressive Spatial-Temporal Attention Network , 2019, ACM Trans. Multim. Comput. Commun. Appl..
[46] Tao Mei,et al. Structured Two-Stream Attention Network for Video Question Answering , 2019, AAAI.
[47] Xinlei Chen,et al. Grounded Video Description , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Serge J. Belongie,et al. Object categorization using co-occurrence, location and appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[49] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[50] Dacheng Tao,et al. Deep Multimodal Neural Architecture Search , 2020, ACM Multimedia.
[51] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[52] Jun Yu,et al. ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering , 2019, AAAI.
[53] Yue-Guang Lyu. Artificial Intelligence: Enabling Technology to Empower Society , 2020 .
[54] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[55] Ali Farhadi,et al. Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[56] Runhao Zeng,et al. Graph Convolutional Networks for Temporal Action Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[57] Zhou Yu,et al. Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering , 2017, IEEE Transactions on Neural Networks and Learning Systems.
[58] Zhou Zhao,et al. Video Dialog via Multi-Grained Convolutional Self-Attention Context Multi-Modal Networks , 2020, IEEE Transactions on Circuits and Systems for Video Technology.
[59] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[60] Michael S. Bernstein,et al. Visual Relationship Detection with Language Priors , 2016, ECCV.
[61] Jung-Woo Ha,et al. Dual Attention Networks for Multimodal Reasoning and Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[62] Chun Chen,et al. Challenges and opportunities: from big data to knowledge in AI 2.0 , 2017, Frontiers of Information Technology & Electronic Engineering.
[63] Zhou Yu,et al. Deep Modular Co-Attention Networks for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[65] Yichen Wei,et al. Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[66] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[67] Yueting Zhuang,et al. Video Dialog via Multi-Grained Convolutional Self-Attention Context Networks , 2019, SIGIR.
[68] Yunhe Pan,et al. Multiple Knowledge Representation of Artificial Intelligence , 2020 .
[69] Kate Saenko,et al. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.
[70] Jongwook Choi,et al. End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[71] Zhou Yu,et al. Open-Ended Long-form Video Question Answering via Adaptive Hierarchical Reinforced Networks , 2018, IJCAI.
[72] Bo Dai,et al. Detecting Visual Relationships with Deep Relational Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[73] Furu Wei,et al. Video Dialog via Progressive Inference and Cross-Transformer , 2019, EMNLP/IJCNLP.
[74] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[75] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[76] Gunhee Kim,et al. A Read-Write Memory Network for Movie Story Understanding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[77] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.