Progressive Attention Memory Network for Movie Story Question Answering
暂无分享,去创建一个
Junyeong Kim | Chang Dong Yoo | Kyungsu Kim | Sungjin Kim | Minuk Ma | C. Yoo | Junyeong Kim | Kyungsu Kim | Sungjin Kim | Minuk Ma
[1] Ramakant Nevatia,et al. Motion-Appearance Co-memory Networks for Video Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[2] Byoung-Tak Zhang,et al. Multimodal Dual Attention Memory for Video Story Question Answering , 2018, ECCV.
[3] Byoung-Tak Zhang,et al. DeepStory: Video Story QA by Deep Embedded Memory Networks , 2017, IJCAI.
[4] Yi Yang,et al. Uncovering the Temporal Context for Video Question Answering , 2017, International Journal of Computer Vision.
[5] Richard S. Zemel,et al. Image Question Answering: A Visual Semantic Embedding Model and a New Dataset , 2015, ArXiv.
[6] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[7] Haoqi Fan,et al. Stacked Latent Attention for Multimodal Reasoning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[8] Junyeong Kim,et al. Pivot Correlational Neural Network for Multimodal Video Categorization , 2018, ECCV.
[9] Gunhee Kim,et al. A Read-Write Memory Network for Movie Story Understanding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[10] Liangliang Cao,et al. MemexQA: Visual Memex Question Answering , 2017, ArXiv.
[11] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[12] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[13] L. Tucker,et al. Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.
[14] Yale Song,et al. TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Matthieu Cord,et al. MUTAN: Multimodal Tucker Fusion for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[16] Gunhee Kim,et al. A Joint Sequence Fusion Model for Video Question Answering and Retrieval , 2018, ECCV.
[17] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[18] Liangliang Cao,et al. Focal Visual-Text Attention for Visual Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[19] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[20] Sanja Fidler,et al. MovieQA: Understanding Stories in Movies through Question-Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[22] Jason Weston,et al. Memory Networks , 2014, ICLR.
[23] Bo Wang,et al. Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents , 2018, AAAI.
[24] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[25] John R. Hershey,et al. Attention-Based Multimodal Fusion for Video Description , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[26] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[27] Michael S. Bernstein,et al. Visual7W: Grounded Question Answering in Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Phong Nguyen,et al. SAT Reading Analysis Using Eye-Gaze Tracking Technology and Machine Learning , 2018, ITS.
[29] Jung-Woo Ha,et al. Dual Attention Networks for Multimodal Reasoning and Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Licheng Yu,et al. TVQA: Localized, Compositional Video Question Answering , 2018, EMNLP.
[31] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[32] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.