The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines
暂无分享,去创建一个
Dima Damen | Hazel Doughty | Giovanni Maria Farinella | Sanja Fidler | Antonino Furnari | Davide Moltisanti | Jonathan Munro | Toby Perrett | Will Price | Michael Wray | Evangelos Kazakos | S. Fidler | D. Damen | G. Farinella | E. Kazakos | Hazel Doughty | Antonino Furnari | D. Moltisanti | Jonathan Munro | Toby Perrett | Will Price | Michael Wray
[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[2] Bolei Zhou,et al. Moments in Time Dataset: One Million Videos for Event Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3] Deva Ramanan,et al. Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[4] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Linchao Zhu,et al. Symbiotic Attention with Privileged Information for Egocentric Action Recognition , 2020, AAAI.
[6] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[7] Andrew Zisserman,et al. A Short Note on the Kinetics-700 Human Action Dataset , 2019, ArXiv.
[8] Susanne Westphal,et al. The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[9] Bernt Schiele,et al. A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[10] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[11] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[12] Ivan Laptev,et al. Joint Discovery of Object States and Manipulation Actions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[13] Nicholas Rhinehart,et al. First-Person Activity Forecasting with Online Inverse Reinforcement Learning , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[14] Yong Jae Lee,et al. Audiovisual SlowFast Networks for Video Recognition , 2020, ArXiv.
[15] Jitendra Malik,et al. From Lifestyle Vlogs to Everyday Interactions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[16] Ivan Laptev,et al. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[17] Horst Bischof,et al. A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.
[18] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[19] Dima Damen,et al. EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[20] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[21] Dima Damen,et al. Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[22] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[23] Zhaoxiang Zhang,et al. Sequence Level Semantics Aggregation for Video Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[24] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[25] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[26] Ted Pedersen,et al. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.
[27] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[28] Yong Jae Lee,et al. Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[29] Sanja Fidler,et al. MovieQA: Understanding Stories in Movies through Question-Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Stephen J. McKenna,et al. Combining embedded accelerometers with computer vision for recognizing food preparation activities , 2013, UbiComp.
[31] Jessica K. Hodgins,et al. Guide to the Carnegie Mellon University Multimodal Activity (CMU-MMAC) Database , 2008 .
[32] Chenliang Xu,et al. Towards Automatic Learning of Procedures From Web Instructional Videos , 2017, AAAI.
[33] Thomas Serre,et al. The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[34] Dima Damen,et al. Action Recognition From Single Timestamp Supervision in Untrimmed Videos , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[36] Cordelia Schmid,et al. Joint Learning of Object and Action Detectors , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[37] Heng Wang,et al. SLAC: A Sparsely Labeled Dataset for Action Classification and Localization , 2017, ArXiv.
[38] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[39] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Larry H. Matthies,et al. First-Person Activity Recognition: What Are They Doing to Me? , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[41] Michael Gygli,et al. Efficient Object Annotation via Speaking and Pointing , 2019, International Journal of Computer Vision.
[42] Kristen Grauman,et al. Ego-Topo: Environment Affordances From Egocentric Video , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.
[44] Dima Damen,et al. Human Routine Change Detection using Bayesian Modelling , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).
[45] James M. Rehg,et al. Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.
[46] Cordelia Schmid,et al. Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos , 2018, ArXiv.
[47] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Sergio Guadarrama,et al. Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Bernt Schiele,et al. A dataset for Movie Description , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Bolei Zhou,et al. Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[52] Giovanni Maria Farinella,et al. Next-active-object prediction from egocentric videos , 2017, J. Vis. Commun. Image Represent..
[53] Adam Kilgarriff,et al. English Senseval: Report and Results , 2000, LREC.
[54] Jianbo Shi,et al. Egocentric Future Localization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Remco C. Veltkamp,et al. Egocentric Hand Track and Object-Based Human Action Recognition , 2019, 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).
[56] James M. Rehg,et al. Social interactions: A first-person perspective , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[57] Ramakant Nevatia,et al. RED: Reinforced Encoder-Decoder Networks for Action Anticipation , 2017, BMVC.
[58] Learning to Learn Words from Narrated Video , 2019, ArXiv.
[59] Antonio Torralba,et al. Anticipating Visual Representations from Unlabeled Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[60] Hang Zhao,et al. HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization , 2017, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[61] Dima Damen,et al. Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[62] Dima Damen,et al. You-Do, I-Learn: Discovering Task Relevant Objects and their Modes of Interaction from Multi-User Egocentric Video , 2014, BMVC.
[63] Michael E. Lesk,et al. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.
[64] Simone Calderara,et al. Understanding social relationships in egocentric vision , 2015, Pattern Recognit..
[65] Dima Damen,et al. Multi-Modal Domain Adaptation for Fine-Grained Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[66] Christian Wolf,et al. Object Level Visual Reasoning in Videos , 2018, ECCV.
[67] Song Han,et al. Temporal Shift Module for Efficient Video Understanding , 2018, ArXiv.
[68] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.