OWL (Observe, Watch, Listen): Localizing Actions in Egocentric Video via Audiovisual Temporal Context
暂无分享,去创建一个
[1] Jian Ma,et al. Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100 , 2021, Int. J. Comput. Vis..
[2] James M. Rehg,et al. Ego4D: Around the World in 3,000 Hours of Egocentric Video , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Ravi Kiran Sarvadevabhatla,et al. Hear Me Out: Fusional Approaches for Audio Augmented Temporal Action Localization , 2021, VISIGRAPP.
[4] Shiwei Zhang,et al. End-to-End Temporal Action Detection With Transformer , 2021, IEEE Transactions on Image Processing.
[5] Dima Damen,et al. With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition , 2021, BMVC.
[6] Niamul Quader,et al. Class Semantics-based Attention for Action Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[7] C. Schmid,et al. Attention Bottlenecks for Multimodal Fusion , 2021, NeurIPS.
[8] Marcelo H. Ang,et al. A Stronger Baseline for Ego-Centric Action Detection , 2021, ArXiv.
[9] Rohit Girdhar,et al. Anticipative Video Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[10] Andrew Zisserman,et al. Temporal Query Networks for Fine-grained Video Understanding , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Cordelia Schmid,et al. ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[12] Wei Wu,et al. Temporal Context Aggregation Network for Temporal Action Proposal Refinement , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Dima Damen,et al. Slow-Fast Auditory Streams for Audio Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Megha Nawhal,et al. Activity Graph Transformer for Temporal Action Localization , 2021, ArXiv.
[15] Bernard Ghanem,et al. MAAS: Multi-modal Assignation for Active Speaker Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[16] Alejandro Cartas,et al. Modeling Long-Term Interactions to Enhance Action Recognition , 2021, 2020 25th International Conference on Pattern Recognition (ICPR).
[17] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[18] Yang Yang,et al. Boundary Content Graph Neural Network for Temporal Action Proposal Generation , 2020, ECCV.
[19] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[20] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[21] Zilei Wang,et al. Progressive Boundary Refinement Network for Temporal Action Detection , 2020, AAAI.
[22] Yong Jae Lee,et al. Audiovisual SlowFast Networks for Video Recognition , 2020, ArXiv.
[23] Ali K. Thabet,et al. G-TAD: Sub-Graph Localization for Temporal Action Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Du Tran,et al. What Makes Training Multi-Modal Classification Networks Hard? , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Basura Fernando,et al. Human Action Sequence Classification , 2019, ArXiv.
[26] Runhao Zeng,et al. Graph Convolutional Networks for Temporal Action Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[27] Dima Damen,et al. EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[28] Shilei Wen,et al. BMN: Boundary-Matching Network for Temporal Action Proposal Generation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[29] Giovanni Maria Farinella,et al. What Would You Expect? Anticipating Egocentric Actions With Rolling-Unrolling LSTMs and Modality Attention , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[30] Kaiming He,et al. Long-Term Feature Banks for Detailed Video Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Jitendra Malik,et al. SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[32] Hang Zhao,et al. HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization , 2017, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[33] Ming Yang,et al. BSN: Boundary Sensitive Network for Temporal Action Proposal Generation , 2018, ECCV.
[34] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[35] Dima Damen,et al. Scaling Egocentric Vision: The EPIC-KITCHENS Dataset , 2018, ArXiv.
[36] Chenliang Xu,et al. Audio-Visual Event Localization in Unconstrained Videos , 2018, ECCV.
[37] Bernard Ghanem,et al. Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization , 2017, ECCV.
[38] Graham W. Taylor,et al. Deep Multimodal Learning: A Survey on Recent Advances and Trends , 2017, IEEE Signal Processing Magazine.
[39] Larry S. Davis,et al. Temporal Context Network for Activity Localization in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[40] Bernard Ghanem,et al. SCC: Semantic Context Cascade for Efficient Action Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Ivan Laptev,et al. Learnable pooling with Context Gating for video classification , 2017, ArXiv.
[42] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[43] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Larry S. Davis,et al. Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[45] Kate Saenko,et al. R-C3D: Region Convolutional 3D Network for Temporal Activity Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[46] Bernard Ghanem,et al. DAPs: Deep Action Proposals for Action Understanding , 2016, ECCV.
[47] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[48] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).