Motion-Appearance Co-memory Networks for Video Question Answering
暂无分享,去创建一个
Ramakant Nevatia | Jiyang Gao | Kan Chen | Runzhou Ge | R. Nevatia | J. Gao | Kan Chen | Runzhou Ge
[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[2] Yale Song,et al. TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Saurabh Singh,et al. Where to Look: Focus Regions for Visual Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Tao Mei,et al. Multi-level Attention Networks for Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Ramakant Nevatia,et al. RED: Reinforced Encoder-Decoder Networks for Action Anticipation , 2017, BMVC.
[6] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Richard Socher,et al. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.
[8] Kewei Tu,et al. Structured Attentions for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[9] Gunnar Farnebäck,et al. Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.
[10] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.
[11] Michael S. Bernstein,et al. Visual7W: Grounded Question Answering in Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] R. Nevatia,et al. TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[13] Bohyung Han,et al. MarioQA: Answering Questions by Watching Gameplay Videos , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[14] Byoung-Tak Zhang,et al. DeepStory: Video Story QA by Deep Embedded Memory Networks , 2017, IJCAI.
[15] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[16] Ramakant Nevatia,et al. TALL: Temporal Activity Localization via Language Query , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[17] Jongwook Choi,et al. End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Ali Farhadi,et al. Visual Semantic Planning Using Deep Successor Representations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[19] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[20] Ramakant Nevatia,et al. Cascaded Boundary Regression for Temporal Action Detection , 2017, BMVC.
[21] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[22] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Wei Xu,et al. ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering , 2015, ArXiv.
[24] Mario Fritz,et al. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.
[25] Yueting Zhuang,et al. Video Question Answering via Gradually Refined Attention over Appearance and Motion , 2017, ACM Multimedia.
[26] Wei Li,et al. CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016 , 2016, ArXiv.
[27] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[28] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[29] Kate Saenko,et al. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.
[30] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Byoung-Tak Zhang,et al. Multimodal Residual Learning for Visual QA , 2016, NIPS.
[32] Trevor Darrell,et al. Localizing Moments in Video with Natural Language , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[33] Sanja Fidler,et al. MovieQA: Understanding Stories in Movies through Question-Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Shih-Fu Chang,et al. Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[36] Seunghoon Hong,et al. Decomposing Motion and Content for Natural Video Sequence Prediction , 2017, ICLR.
[37] Shuicheng Yan,et al. A Focused Dynamic Attention Model for Visual Question Answering , 2016, ArXiv.
[38] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[39] Richard S. Zemel,et al. Exploring Models and Data for Image Question Answering , 2015, NIPS.
[40] Gunhee Kim,et al. A Read-Write Memory Network for Movie Story Understanding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[41] Trevor Darrell,et al. Learning to Reason: End-to-End Module Networks for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[42] Ramakant Nevatia,et al. Knowledge Aided Consistency for Weakly Supervised Phrase Grounding , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.