EgoVQA - An Egocentric Video Question Answering Benchmark Dataset
暂无分享,去创建一个
[1] Gedas Bertasius,et al. Using Cross-Model EgoSupervision to Learn Cooperative Basketball Intention , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).
[2] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[3] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[4] James M. Rehg,et al. Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.
[5] Stefan Lee,et al. Embodied Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[6] David J. Crandall,et al. Enhancing Lifelogging Privacy by Detecting Screens , 2016, CHI.
[7] James M. Rehg,et al. Learning to recognize objects in egocentric activities , 2011, CVPR 2011.
[8] Jianbo Shi,et al. Am I a Baller? Basketball Performance Assessment from First-Person Videos , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[9] Yoichi Sato,et al. Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition , 2018, ECCV.
[10] Ramakant Nevatia,et al. Motion-Appearance Co-memory Networks for Video Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[11] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[12] Yong Jae Lee,et al. Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[13] Rami Albatal,et al. NTCIR Lifelog: The First Test Collection for Lifelog Research , 2016, SIGIR.
[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Long Chen,et al. Video Question Answering via Attribute-Augmented Attention Network Learning , 2017, SIGIR.
[16] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.
[17] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[18] Trevor Darrell,et al. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[19] Mario Fritz,et al. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.
[20] Yong Jae Lee,et al. Identifying First-Person Camera Wearers in Third-Person Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Zhiwu Lu,et al. Recursive Visual Attention in Visual Dialog , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Giovanni Maria Farinella,et al. Recognizing Personal Contexts from Egocentric Images , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).
[24] John R. Hershey,et al. Attention-Based Multimodal Fusion for Video Description , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[25] Yale Song,et al. TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[27] Michael S. Ryoo,et al. Forecasting Hands and Objects in Future Frames , 2018, ECCV Workshops.
[28] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[29] Kristen Grauman,et al. Detecting Engagement in Egocentric Video , 2016, ECCV.
[30] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[31] Shu Zhang,et al. Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Linda B. Smith,et al. An egocentric perspective on active vision and visual object learning in toddlers , 2017, 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).
[33] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Petia Radeva,et al. Egocentric video description based on temporally-linked sequences , 2018, J. Vis. Commun. Image Represent..
[35] Giovanni Maria Farinella,et al. Next-active-object prediction from egocentric videos , 2017, J. Vis. Commun. Image Represent..
[36] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[37] Petia Radeva,et al. Toward Storytelling From Visual Lifelogging: An Overview , 2015, IEEE Transactions on Human-Machine Systems.
[38] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[39] Michael S. Ryoo,et al. Joint Person Segmentation and Identification in Synchronized First- and Third-person Videos , 2018, ECCV.
[40] David J. Crandall,et al. Deepdiary: Lifelogging image captioning and summarization , 2018, J. Vis. Commun. Image Represent..
[41] Yale Song,et al. TGIF: A New Dataset and Benchmark on Animated GIF Description , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Jonghyun Choi,et al. Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Qi Zhao,et al. Deep Future Gaze: Gaze Anticipation on Egocentric Videos Using Adversarial Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Stefan Lee,et al. Embodied Question Answering in Photorealistic Environments With Point Cloud Perception , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[46] Shahram Izadi,et al. SenseCam: A Retrospective Memory Aid , 2006, UbiComp.
[47] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Larry H. Matthies,et al. First-Person Activity Recognition: What Are They Doing to Me? , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[49] Yueting Zhuang,et al. Video Question Answering via Gradually Refined Attention over Appearance and Motion , 2017, ACM Multimedia.
[50] Linda B. Smith,et al. Toddler-Inspired Visual Object Learning , 2018, NeurIPS.
[51] Yoichi Sato,et al. Future Person Localization in First-Person Videos , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[52] Michael Riegler,et al. Overview of ImageCLEFlifelog 2019: Solve My Life Puzzle and Lifelog Moment Retrieval , 2019, CLEF.
[53] Alan F. Smeaton,et al. An Examination of a Large Visual Lifelog , 2008, AIRS.
[54] Stefan Lee,et al. Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[55] William B. Dolan,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.
[56] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .