Video Story Question Answering with Character-Centric Scene Parsing and Question-Aware Temporal Attention
暂无分享,去创建一个
Dimitris N. Metaxas | Gerard de Melo | Hang Zhang | Shijie Geng | Ahmed Elgammal | Ji Zhang | Zuohui Fu
[1] Quoc V. Le,et al. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.
[2] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[3] Cordelia Schmid,et al. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[4] Rui Liu,et al. Phase Conductor on Multi-layered Attentions for Machine Comprehension , 2017, ArXiv.
[5] Licheng Yu,et al. TVQA: Localized, Compositional Video Question Answering , 2018, EMNLP.
[6] Zhoujun Li,et al. DocChat: An Information Retrieval Approach for Chatbot Engines Using Unstructured Documents , 2016, ACL.
[7] Richard Socher,et al. Dynamic Coattention Networks For Question Answering , 2016, ICLR.
[8] Chuang Gan,et al. Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering , 2019, AAAI.
[9] Ji Zhang,et al. Large-Scale Visual Relationship Understanding , 2018, AAAI.
[10] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[11] Bo Wang,et al. Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents , 2018, AAAI.
[12] Gunhee Kim,et al. A Read-Write Memory Network for Movie Story Understanding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[13] Byoung-Tak Zhang,et al. Multimodal Dual Attention Memory for Video Story Question Answering , 2018, ECCV.
[14] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[15] Ali Farhadi,et al. Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.