EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
暂无分享,去创建一个
Dima Damen | Andrew Zisserman | Arsha Nagrani | Evangelos Kazakos | Andrew Zisserman | D. Damen | Arsha Nagrani | E. Kazakos
[1] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[2] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[3] M. Wallace,et al. The effects of visual training on multisensory temporal processing , 2013, Experimental Brain Research.
[4] Yong Jae Lee,et al. Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[5] M. Ernst,et al. When Correlation Implies Causation in Multisensory Integration , 2012, Current Biology.
[6] Deva Ramanan,et al. Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[7] John J. Foxe,et al. Recalibration of the Multisensory Temporal Window of Integration Results from Changing Task Demands , 2013, PloS one.
[8] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[9] Dima Damen,et al. You-Do, I-Learn: Discovering Task Relevant Objects and their Modes of Interaction from Multi-User Egocentric Video , 2014, BMVC.
[10] M. Wallace,et al. The construct of the multisensory temporal binding window and its dysregulation in developmental disabilities , 2014, Neuropsychologia.
[11] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[12] Tamara L. Berg,et al. Temporal Perception and Prediction in Ego-Centric Video , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[13] Yoichi Sato,et al. Recognizing Micro-Actions and Reactions from Paired Egocentric Videos , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] C. V. Jawahar,et al. First Person Action Recognition Using Deep Learned Descriptors , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Kris M. Kitani,et al. Going Deeper into First-Person Activity Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Joo-Hwee Lim,et al. Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[17] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[18] Xi Wang,et al. Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification , 2016, ACM Multimedia.
[19] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[20] Andrew Zisserman,et al. Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[22] Ivan Laptev,et al. Learnable pooling with Context Gating for video classification , 2017, ArXiv.
[23] Fabio A. González,et al. Gated Multimodal Units for Information Fusion , 2017, ICLR.
[24] Abhinav Gupta,et al. ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[26] Dima Damen,et al. Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[27] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[28] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[30] Andrew Zisserman,et al. Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[31] Lorenzo Torresani,et al. Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization , 2018, NeurIPS.
[32] Ivan Laptev,et al. Learning a Text-Video Embedding from Incomplete and Heterogeneous Data , 2018, ArXiv.
[33] Weiyao Lin,et al. Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion , 2018, AAAI.
[34] Xiao Liu,et al. Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[35] Xiao Liu,et al. Multimodal Keyless Attention Fusion for Video Classification , 2018, AAAI.
[36] Andrew Zisserman,et al. A Better Baseline for AVA , 2018, ArXiv.
[37] Oswald Lanz,et al. Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition , 2018, BMVC.
[38] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[39] Andrew Zisserman,et al. Learnable PINs: Cross-Modal Embeddings for Person Identity , 2018, ECCV.
[40] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[41] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[42] Cordelia Schmid,et al. Actor and Observer: Joint Modeling of First and Third-Person Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[43] Dima Damen,et al. Scaling Egocentric Vision: The EPIC-KITCHENS Dataset , 2018, ArXiv.
[44] Tim K. Marks,et al. Audio Visual Scene-aware dialog (AVSD) Track for Natural Language Generation in DSTC7 , 2019 .
[45] Kristen Grauman,et al. 2.5D Visual Sound , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).