Uncovering the Temporal Context for Video Question Answering
暂无分享,去创建一个
Yi Yang | Alexander G. Hauptmann | Linchao Zhu | Zhongwen Xu | Zhongwen Xu | Alexander Hauptmann | Yi Yang | Linchao Zhu
[1] Allan Jabri,et al. Revisiting Visual Question Answering Baselines , 2016, ECCV.
[2] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[3] Xiao Lin,et al. Don't just listen, use your imagination: Leveraging visual common sense for non-visual tasks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] DarrellTrevor,et al. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description , 2017 .
[5] Tairui Chen,et al. Going Deeper with Convolutional Neural Network for Intelligent Transportation , 2016 .
[6] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.
[8] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[9] Mario Fritz,et al. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.
[10] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[11] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[12] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[13] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[14] Wei Xu,et al. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question , 2015, NIPS.
[15] Ronan Collobert,et al. Phrase-based Image Captioning , 2015, ICML.
[16] Cordelia Schmid,et al. Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.
[17] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Quan Pan,et al. Multi-band Polarization Imaging and Applications , 2016, Advances in Computer Vision and Pattern Recognition.
[19] Yi Yang,et al. A discriminative CNN video representation for event detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[21] Jens Lehmann,et al. DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.
[22] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.
[23] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[24] Yi Yang,et al. Ranking with local regression and global alignment for cross media retrieval , 2009, ACM Multimedia.
[25] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[26] Nitish Srivastava,et al. Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.
[27] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[28] Bernt Schiele,et al. A dataset for Movie Description , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Michael Isard,et al. A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.
[30] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[31] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Deli Zhao,et al. Recognizing an Action Using Its Name: A Knowledge-Based Approach , 2016, International Journal of Computer Vision.
[33] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[34] Kewei Tu,et al. Joint Video and Text Parsing for Understanding Events and Answering Queries , 2013, IEEE MultiMedia.
[35] Bernt Schiele,et al. Grounding Action Descriptions in Videos , 2013, TACL.
[36] Frank Keller,et al. Comparing Automatic Evaluation Measures for Image Description , 2014, ACL.
[37] Karl Stratos,et al. Large Scale Retrieval and Generation of Image Descriptions , 2015, International Journal of Computer Vision.
[38] Sanja Fidler,et al. MovieQA: Understanding Stories in Movies through Question-Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Alan L. Yuille,et al. Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[41] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.
[42] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[43] Bernt Schiele,et al. Translating Video Content to Natural Language Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.
[44] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[45] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[46] Licheng Yu,et al. Visual Madlibs: Fill in the blank Image Generation and Question Answering , 2015, ArXiv.
[47] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[48] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Jeffrey Mark Siskind,et al. Grounded Language Learning from Video Described with Sentences , 2013, ACL.
[50] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[51] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[52] Richard S. Zemel,et al. Exploring Models and Data for Image Question Answering , 2015, NIPS.
[53] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[54] Dan Klein,et al. Accurate Unlexicalized Parsing , 2003, ACL.
[55] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[56] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[57] Yi Yang,et al. Image Classification by Cross-Media Active Learning With Privileged Information , 2016, IEEE Transactions on Multimedia.
[58] Peng Wang,et al. Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge from External Sources , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[60] Dima Damen,et al. Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[61] Antonio Torralba,et al. Anticipating the future by watching unlabeled video , 2015, ArXiv.
[62] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[63] Michael S. Bernstein,et al. Visual7W: Grounded Question Answering in Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Tamara L. Berg,et al. Baby Talk : Understanding and Generating Image Descriptions , 2011 .
[65] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[66] Yi Yang,et al. Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).