Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization
暂无分享,去创建一个
Runhao Zeng | Chuang Gan | Mingkui Tan | Qingyao Wu | Haoming Xu | Chuang Gan | Mingkui Tan | Qingyao Wu | Runhao Zeng | Haoming Xu
[1] Jiajun Wu,et al. Deep multiple instance learning for image classification and auto-annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[3] Stavroula G. Mougiakakou,et al. Self-Attention and Ingredient-Attention Based Model for Recipe Retrieval from Image Queries , 2019, MADiMa @ ACM Multimedia.
[4] Rita Cucchiara,et al. Meshed-Memory Transformer for Image Captioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Chuang Gan,et al. Self-supervised Audio-visual Co-segmentation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Runhao Zeng,et al. Location-Aware Graph Convolutional Networks for Video Question Answering , 2020, AAAI.
[7] Tae-Hyun Oh,et al. Speech2Face: Learning the Face Behind a Voice , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[9] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[10] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Futai Zou,et al. Adversarial Seeded Sequence Growing for Weakly-Supervised Temporal Action Localization , 2019, ACM Multimedia.
[12] Chuang Gan,et al. The Sound of Motions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[13] Yi Yang,et al. DevNet: A Deep Event Network for multimedia event detection and evidence recounting , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[15] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[16] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[17] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[18] Dima Damen,et al. EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[19] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[20] Chuang Gan,et al. Self-Supervised Moving Vehicle Tracking With Stereo Sound , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[21] Tae-Hyun Oh,et al. Listen to Look: Action Recognition by Previewing Audio , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Zhou Yu,et al. Deep Modular Co-Attention Networks for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Chenliang Xu,et al. Audio-Visual Event Localization in Unconstrained Videos , 2018, ECCV.
[24] Qingming Huang,et al. Learning Fragment Self-Attention Embeddings for Image-Text Matching , 2019, ACM Multimedia.
[25] Tat-Seng Chua,et al. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Yichen Wei,et al. Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[27] Byoung-Tak Zhang,et al. Dual Attention Networks for Visual Reference Resolution in Visual Dialog , 2019, EMNLP.
[28] Chuang Gan,et al. Foley Music: Learning to Generate Music from Videos , 2020, ECCV.
[29] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[30] David A. Forsyth,et al. Video Event Detection: From Subvolume Localization to Spatiotemporal Path Search , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[31] Yan Yan,et al. Dual Attention Matching for Audio-Visual Event Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[32] Runhao Zeng,et al. Dense Regression Network for Video Grounding , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Runhao Zeng,et al. Graph Convolutional Networks for Temporal Action Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[34] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[36] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Yu-Chiang Frank Wang,et al. Dual-modality Seq2Seq Network for Audio-visual Event Localization , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[39] Runhao Zeng,et al. Relation Attention for Temporal Action Localization , 2020, IEEE Transactions on Multimedia.
[40] Runhao Zeng,et al. Breaking Winner-Takes-All: Iterative-Winners-Out Networks for Weakly Supervised Temporal Action Localization , 2019, IEEE Transactions on Image Processing.
[41] Chuang Gan,et al. Music Gesture for Visual Sound Separation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Gregory D. Hager,et al. Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Yang Wang,et al. Cross-Modal Self-Attention Network for Referring Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).