Improving Audio-Visual Video Parsing with Pseudo Visual Labels
暂无分享,去创建一个
[1] Stan Birchfield,et al. Audio-Visual Segmentation with Semantics , 2023, ArXiv.
[2] Zhengjun Zha,et al. Semantic and Relation Modulation for Audio-Visual Event Localization , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3] Tanvir Mahmud,et al. AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization , 2022, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
[4] J. Allebach,et al. Seq-UPS: Sequential Uncertainty-aware Pseudo-label Selection for Semi-Supervised Text Recognition , 2022, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
[5] Lingqiao Liu,et al. ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation , 2022, ArXiv.
[6] Weidi Xie,et al. Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models , 2022, BMVC.
[7] Fumin Shen,et al. DHHN: Dual Hierarchical Hybrid Network for Weakly-Supervised Audio-Visual Video Parsing , 2022, ACM Multimedia.
[8] Qingming Huang,et al. Span-based Audio-Visual Localization , 2022, ACM Multimedia.
[9] Xin Wang,et al. AVQA: A Dataset for Audio-Visual Question Answering on Videos , 2022, ACM Multimedia.
[10] Yapeng Tian,et al. Learning in Audio-visual Context: A Review, Analysis, and New Perspective , 2022, ArXiv.
[11] Stan Birchfield,et al. Audio-Visual Segmentation , 2022, ECCV.
[12] Marcella Cornia,et al. The Unreasonable Effectiveness of CLIP Features for Image Captioning: An Experimental Analysis , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[13] Zhou Zhao,et al. Cross-modal Background Suppression for Audio-Visual Event Localization , 2022, Computer Vision and Pattern Recognition.
[14] Jae Myung Kim,et al. Large Loss Matters in Weakly Supervised Multi-Label Classification , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Oriol Vinyals,et al. Flamingo: a Visual Language Model for Few-Shot Learning , 2022, NeurIPS.
[16] Chen Qian,et al. Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing , 2022, ECCV.
[17] Junyu Gao,et al. Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Ganesh Ramakrishnan,et al. Investigating Modality Bias in Audio Visual Video Parsing , 2022, ArXiv.
[19] Yapeng Tian,et al. Learning to Answer Questions in Dynamic Audio-Visual Scenarios , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Dengxin Dai,et al. Decoupling Zero-Shot Semantic Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Jiwen Lu,et al. DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Chen Change Loy,et al. Extract Free Dense Labels from CLIP , 2021, ECCV.
[23] Tongliang Liu,et al. CRIS: CLIP-Driven Referring Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Yuejie Zhang,et al. MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing , 2021, ACM Multimedia.
[25] Varshanth R. Rao,et al. Dual Perspective Network for Audio-Visual Event Localization , 2022, ECCV.
[26] Fengyun Rao,et al. CLIP4Caption: CLIP for Video Caption , 2021, ACM Multimedia.
[27] Youngjae Yu,et al. Pano-AVQA: Grounded Audio-Visual Question Answering on 360° Videos , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[28] Gang Hua,et al. Enriching Local and Global Contexts for Temporal Action Localization , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[29] Yu Wu,et al. Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Peng Hu,et al. Learning Cross-Modal Retrieval with Noisy Labels , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] James R. Glass,et al. AST: Audio Spectrogram Transformer , 2021, Interspeech.
[32] Abhishek,et al. Cross-Modal learning for Audio-Visual Video Parsing , 2021, Interspeech.
[33] Shijie Hao,et al. Positive Sample Propagation along the Audio-Visual Event Line , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] R. Nevatia,et al. SimPLE: Similar Pseudo Label Exploitation for Semi-Supervised Classification , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[36] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[37] Mubarak Shah,et al. In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning , 2021, ICLR.
[38] Quoc V. Le,et al. Meta Pseudo Labels , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Fangyun Wei,et al. A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model , 2021, ArXiv.
[40] Ming-Hsuan Yang,et al. Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing , 2021, NeurIPS.
[41] Runhao Zeng,et al. Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization , 2020, ACM Multimedia.
[42] Weiyao Lin,et al. Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching , 2020, NeurIPS.
[43] Andrew Owens,et al. Self-Supervised Learning of Audio-Visual Objects from Video , 2020, ECCV.
[44] Chenliang Xu,et al. Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing , 2020, ECCV.
[45] Weiyao Lin,et al. Multiple Sound Sources Localization from Coarse to Fine , 2020, ECCV.
[46] Quoc V. Le,et al. Rethinking Pre-training and Self-training , 2020, NeurIPS.
[47] Quoc V. Le,et al. Improved Noisy Student Training for Automatic Speech Recognition , 2020, INTERSPEECH.
[48] Yan Yan,et al. Cross-Modal Attention Network for Temporal Inconsistent Audio-Visual Event Localization , 2020, AAAI.
[49] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Awni Y. Hannun,et al. Self-Training for End-to-End Speech Recognition , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[51] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[52] Yan Yan,et al. Dual Attention Matching for Audio-Visual Event Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[53] Runhao Zeng,et al. Graph Convolutional Networks for Temporal Action Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[54] Chuang Gan,et al. Self-supervised Audio-visual Co-segmentation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[55] Heng Wang,et al. Video Classification With Channel-Separated Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[56] Yu-Chiang Frank Wang,et al. Dual-modality Seq2Seq Network for Audio-visual Event Localization , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[57] Xuelong Li,et al. Deep Multimodal Clustering for Unsupervised Audiovisual Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[58] Xiao Liu,et al. Multimodal Keyless Attention Fusion for Video Classification , 2018, AAAI.
[59] Rahul Sukthankar,et al. Rethinking the Faster R-CNN Architecture for Temporal Action Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[60] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[61] Chenliang Xu,et al. Audio-Visual Event Localization in Unconstrained Videos , 2018, ECCV.
[62] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[63] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[64] Yann LeCun,et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[65] Xiao Liu,et al. Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[66] Anurag Kumar,et al. Knowledge Transfer from Weakly Labeled Audio Using Convolutional Neural Network for Sound Events and Scenes , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[67] Yong Xu,et al. Audio Set Classification with Attention Model: A Probabilistic Perspective , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[68] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.
[69] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[70] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[71] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[72] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[73] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[74] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[75] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[76] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.