Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization
暂无分享,去创建一个
Dan Xu | Ying Shan | Fa-Ting Hong | Wei-Shi Zheng | Jia-Chang Feng | Weishi Zheng | Ying Shan | Dan Xu | Jialuo Feng | Fa-Ting Hong
[1] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Yadong Mu,et al. Weakly-Supervised Action Localization by Generative Attention Modeling , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Lei Zhang,et al. AutoLoc: Weakly-supervised Temporal Action Localization , 2018, ECCV.
[4] Weishi Zheng,et al. MINI-Net: Multiple Instance Ranking Network for Video Highlight Detection , 2020, ECCV.
[5] Changsheng Li,et al. Multi-Instance Multi-Label Action Recognition and Localization Based on Spatio-Temporal Pre-Trimming for Untrimmed Videos , 2020, AAAI.
[6] Bernard Ghanem,et al. TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks , 2020, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).
[7] Ling Shao,et al. 3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[9] Fei Wu,et al. Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection , 2018, AAAI.
[10] Mubarak Shah,et al. Real-World Anomaly Detection in Surveillance Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[11] Andrew Owens,et al. Self-Supervised Learning of Audio-Visual Objects from Video , 2020, ECCV.
[12] Ashraful Islam,et al. Weakly Supervised Temporal Action Localization Using Deep Metric Learning , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[13] Rahul Sukthankar,et al. Rethinking the Faster R-CNN Architecture for Temporal Action Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[14] Junsong Yuan,et al. Pruning 3D Filters For Accelerating 3D ConvNets , 2020, IEEE Transactions on Multimedia.
[15] Zhe Gan,et al. Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Nicu Sebe,et al. PAD-Net: Multi-tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[17] Bolei Zhou,et al. A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Youngjung Uh,et al. Background Suppression Network for Weakly-supervised Temporal Action Localization , 2020, ArXiv.
[19] Megha Nawhal,et al. Activity Graph Transformer for Temporal Action Localization , 2021, ArXiv.
[20] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[22] Kyle Min,et al. Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization , 2020, ECCV.
[23] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[24] Shih-Fu Chang,et al. Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Daochang Liu,et al. Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Bohyung Han,et al. Weakly Supervised Action Localization by Sparse Temporal Pooling Network , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[27] Wolfram Burgard,et al. Self-Supervised Model Adaptation for Multimodal Semantic Segmentation , 2018, International Journal of Computer Vision.
[28] Amit K. Roy-Chowdhury,et al. W-TALC: Weakly-supervised Temporal Activity Localization and Classification , 2018, ECCV.
[29] Cees G. M. Snoek,et al. ActionBytes: Learning From Trimmed Videos to Localize Actions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Fa-Ting Hong,et al. MIST: Multiple Instance Self-Training Framework for Video Anomaly Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Ming Yang,et al. BSN: Boundary Sensitive Network for Temporal Action Proposal Generation , 2018, ECCV.
[32] Jian-Huang Lai,et al. Hybrid Dynamic-static Context-aware Attention Network for Action Assessment in Long Videos , 2020, ACM Multimedia.
[33] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[34] Liang Wang,et al. Cross-Modal Cross-Domain Moment Alignment Network for Person Search , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Gourab Kundu,et al. SF-Net: Single-Frame Supervision for Temporal Action Localization , 2020, ECCV.
[36] Dima Damen,et al. Multi-Modal Domain Adaptation for Fine-Grained Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[37] Hyunjung Shim,et al. Attention-Based Dropout Layer for Weakly Supervised Object Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[39] Wenwu Zhu,et al. Learning Compact Hash Codes for Multimodal Representations Using Orthogonal Deep Structure , 2015, IEEE Transactions on Multimedia.
[40] Xinbo Gao,et al. Triplet-Based Deep Hashing Network for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.
[41] Gang Hua,et al. Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization , 2020, ECCV.
[42] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[43] Nicu Sebe,et al. Learning Cross-Modal Deep Representations for Robust Pedestrian Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Tao Xiang,et al. Boundary-sensitive Pre-training for Temporal Localization in Videos , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[45] Limin Wang,et al. Temporal Action Detection with Structured Segment Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[46] Nicu Sebe,et al. Learning Deep Representations of Appearance and Motion for Anomalous Event Detection , 2015, BMVC.
[47] Hyeran Byun,et al. Weakly-supervised Temporal Action Localization by Uncertainty Modeling , 2020, AAAI.
[48] Runhao Zeng,et al. Graph Convolutional Networks for Temporal Action Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[49] Yadong Mu,et al. Learning Temporal Co-Attention Models for Unsupervised Video Action Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Gang Hua,et al. ACSNet: Action-Context Separation Network for Weakly Supervised Temporal Action Localization , 2021, AAAI.
[51] Bernard Ghanem,et al. RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization , 2019, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[52] Chengjiang Long,et al. A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization , 2021, AAAI.