Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100
暂无分享,去创建一个
Jian Ma | Dima Damen | Hazel Doughty | Giovanni Maria Farinella | Antonino Furnari | Evangelos Kazakos | Davide Moltisanti | Jonathan Munro | Toby Perrett | Will Price | Michael Wray | D. Damen | G. Farinella | E. Kazakos | Hazel Doughty | Antonino Furnari | Jian Ma | D. Moltisanti | Jonathan Munro | Toby Perrett | Will Price | Michael Wray
[1] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[2] Kate Saenko,et al. VisDA: The Visual Domain Adaptation Challenge , 2017, ArXiv.
[3] Ivan Laptev,et al. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[4] Jun Li,et al. Weakly Supervised Energy-Based Learning for Action Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[5] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Chenliang Xu,et al. Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[7] Yong Jae Lee,et al. Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[8] Juan Carlos Niebles,et al. D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Sergio Guadarrama,et al. Tracking Emerges by Colorizing Videos , 2018, ECCV.
[10] Cordelia Schmid,et al. Supplementary Material: AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[11] Giovanni Maria Farinella,et al. Leveraging Uncertainty to Rethink Loss Functions and Evaluation Measures for Egocentric Action Anticipation , 2018, ECCV Workshops.
[12] Michael Gygli,et al. Efficient Object Annotation via Speaking and Pointing , 2019, International Journal of Computer Vision.
[13] Daochang Liu,et al. Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Juan Carlos Niebles,et al. Connectionist Temporal Modeling for Weakly Supervised Action Labeling , 2016, ECCV.
[15] Dima Damen,et al. EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[16] Sethuraman Panchanathan,et al. Deep Hashing Network for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Dima Damen,et al. Scaling Egocentric Vision: The Dataset , 2018, ECCV.
[18] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[19] Cordelia Schmid,et al. Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos , 2018, ArXiv.
[20] Vladlen Koltun,et al. Does computer vision matter for action? , 2019, Science Robotics.
[21] Paolo Favaro,et al. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.
[22] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[23] Jianmin Wang,et al. Correlation Hashing Network for Efficient Cross-Modal Retrieval , 2016, BMVC.
[24] Dima Damen,et al. Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[25] Fei-Fei Li,et al. What's the Point: Semantic Segmentation with Point Supervision , 2015, ECCV.
[26] Stefan Milz,et al. WoodScape: A Multi-Task, Multi-Camera Fisheye Dataset for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[27] K. S. Venkatesh,et al. Deep Domain Adaptation in Action Space , 2018, BMVC.
[28] Qiang Xu,et al. nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Stephen J. McKenna,et al. Combining embedded accelerometers with computer vision for recognizing food preparation activities , 2013, UbiComp.
[30] Bernt Schiele,et al. Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods , 2018, ICLR.
[31] Juan Carlos Niebles,et al. Dense-Captioning Events in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[32] Giovanni Maria Farinella,et al. Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[33] Cordelia Schmid,et al. Actions in context , 2009, CVPR.
[34] James M. Rehg,et al. Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.
[35] Jitendra Malik,et al. Visual Semantic Role Labeling , 2015, ArXiv.
[36] Juan Carlos Niebles,et al. Adversarial Cross-Domain Action Recognition with Co-Attention , 2019, AAAI.
[37] Chenliang Xu,et al. Towards Automatic Learning of Procedures From Web Instructional Videos , 2017, AAAI.
[38] Thomas Serre,et al. The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[39] Joerg P. Ueberla,et al. Domain adaptation with clustered language models , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[40] Xinlei Chen,et al. Grounded Video Description , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[42] Cordelia Schmid,et al. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[43] Cees Snoek,et al. Online Action Detection , 2016, ECCV.
[44] Alex Bewley,et al. Incremental Adversarial Domain Adaptation for Continually Changing Environments , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[45] Bernard Ghanem,et al. What Do I Annotate Next? An Empirical Study of Active Learning for Action Localization , 2018, ECCV.
[46] Nikhil Rasiwasia,et al. Cluster Canonical Correlation Analysis , 2014, AISTATS.
[47] Ruxin Chen,et al. Temporal Attentive Alignment for Large-Scale Video Domain Adaptation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[48] Juergen Gall,et al. NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[49] Peter Kontschieder,et al. The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[50] Alexei A. Efros,et al. Unbiased look at dataset bias , 2011, CVPR 2011.
[51] Charless C. Fowlkes,et al. Weakly-Supervised Action Localization With Background Modeling , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[52] Bo Wang,et al. Moment Matching for Multi-Source Domain Adaptation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[53] Gregory D. Hager,et al. Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Dima Damen,et al. You-Do, I-Learn: Discovering Task Relevant Objects and their Modes of Interaction from Multi-User Egocentric Video , 2014, BMVC.
[55] Philip H. S. Torr,et al. DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Xiaohua Zhai,et al. A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark , 2019 .
[57] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.
[58] Bernt Schiele,et al. A dataset for Movie Description , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Bohyung Han,et al. Weakly Supervised Action Localization by Sparse Temporal Pooling Network , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[60] Jitendra Malik,et al. SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[61] Bolei Zhou,et al. Moments in Time Dataset: One Million Videos for Event Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[62] Bernt Schiele,et al. A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[63] Ruigang Yang,et al. The ApolloScape Dataset for Autonomous Driving , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[64] Ian D. Reid,et al. High Five: Recognising human interactions in TV shows , 2010, BMVC.
[65] Scott Workman,et al. Predicting Ground-Level Scene Layout from Aerial Imagery , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[66] Edward K. Wong,et al. Dual many-to-one-encoder-based transfer learning for cross-dataset human action recognition , 2016, Image Vis. Comput..
[67] Leland McInnes,et al. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.
[68] Cordelia Schmid,et al. A flexible model for training action localization with varying levels of supervision , 2018, NeurIPS.
[69] J. Heckman. Sample selection bias as a specification error , 1979 .
[70] Changsheng Xu,et al. Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval , 2015, IEEE Transactions on Multimedia.
[71] James M. Rehg,et al. Delving into egocentric actions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[72] Shilei Wen,et al. BMN: Boundary-Matching Network for Temporal Action Proposal Generation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[73] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[74] Jaana Kekäläinen,et al. Cumulated gain-based evaluation of IR techniques , 2002, TOIS.
[75] Yoichi Sato,et al. EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2021: Team M3EM Technical Report , 2021, ArXiv.
[76] Luc Van Gool,et al. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[77] Arkadiusz Stopczynski,et al. Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[78] Miguel Cazorla,et al. ImageCLEF 2014: Overview and Analysis of the Results , 2014, CLEF.
[79] Paul Newman,et al. 1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..
[80] Hang Zhao,et al. HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization , 2017, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[81] Dima Damen,et al. Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[82] Cordelia Schmid,et al. Human Action Localization with Sparse Spatial Supervision , 2017 .
[83] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[84] Li Fei-Fei,et al. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos , 2015, International Journal of Computer Vision.
[85] Kaiming He,et al. Rethinking ImageNet Pre-Training , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[86] Bolei Zhou,et al. Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[87] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..
[88] Cees Snoek,et al. Spot On: Action Localization from Pointly-Supervised Proposals , 2016, ECCV.
[89] Luc Van Gool,et al. UntrimmedNets for Weakly Supervised Action Recognition and Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[90] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[91] Dima Damen,et al. Action Recognition From Single Timestamp Supervision in Untrimmed Videos , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[92] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[93] Cordelia Schmid,et al. Weakly Supervised Action Labeling in Videos under Ordering Constraints , 2014, ECCV.
[94] Chuang Gan,et al. TSM: Temporal Shift Module for Efficient Video Understanding , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[95] Leonidas J. Guibas,et al. Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[96] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[97] Hema Swetha Koppula,et al. Anticipating Human Activities Using Object Affordances for Reactive Robotic Response , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[98] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.
[99] Deva Ramanan,et al. Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[100] Susanne Westphal,et al. The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[101] Gernot A. Fink,et al. Detection and Retrieval of Out-of-Distribution Objects in Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[102] Emanuele Alberti,et al. PoliTO-IIT Submission to the EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition , 2021, ArXiv.
[103] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[104] William B. Dolan,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.
[105] Maneesh Singh,et al. Progressive Domain Adaptation for Object Detection , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[106] Dima Damen,et al. Scaling Egocentric Vision: The Open image in new window Dataset , 2018 .
[107] Changsheng Xu,et al. A Unified Framework for Multimodal Domain Adaptation , 2018, ACM Multimedia.
[108] Jonathan Munro,et al. Multi-Modal Domain Adaptation for Fine-Grained Action Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[109] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[110] Andrew Zisserman,et al. A Short Note on the Kinetics-700 Human Action Dataset , 2019, ArXiv.
[111] Yuan Shi,et al. Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[112] Horst Bischof,et al. A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.
[113] Dennis Koelma,et al. The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection , 2016, ICMR.
[114] Trevor Darrell,et al. Adapting Visual Category Models to New Domains , 2010, ECCV.
[115] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[116] Ryan M. Eustice,et al. University of Michigan North Campus long-term vision and lidar dataset , 2016, Int. J. Robotics Res..
[117] Leland McInnes,et al. UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..
[118] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[119] David F. Fouhey,et al. Understanding Human Hands in Contact at Internet Scale , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[120] Ling Shao,et al. 3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[121] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.
[122] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[123] Stella X. Yu,et al. Open Compound Domain Adaptation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[124] David J. Fleet,et al. ON THE EFFECTIVENESS OF TASK GRANULARITY FOR TRANSFER LEARNING , 2018, 1804.09235.
[125] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[126] Jessica K. Hodgins,et al. Guide to the Carnegie Mellon University Multimodal Activity (CMU-MMAC) Database , 2008 .