Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation
暂无分享,去创建一个
Yanfeng Wang | Jinxian Liu | Ya Zhang | Yu Wang | Chen Ju | Chao Ma
[1] N. Barnes,et al. Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Stan Birchfield,et al. Audio-Visual Segmentation , 2022, ECCV.
[3] Weidi Xie,et al. Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation , 2022, ACM Multimedia.
[4] D. Clifton,et al. Multimodal Learning With Transformers: A Survey , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[5] T. Tan,et al. Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes , 2022, ArXiv.
[6] S. Song,et al. Vision Transformer with Deformable Attention , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Nick Barnes,et al. Learning Generative Vision Transformer with Energy-Based Latent Space for Saliency Prediction , 2021, NeurIPS.
[8] Ruihua Song,et al. Class-Aware Sounding Objects Localization via Audiovisual Correspondence , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[9] Huchuan Lu,et al. Bidirectional Relationship Inferring Network for Referring Image Localization and Segmentation , 2021, IEEE Transactions on Neural Networks and Learning Systems.
[10] P. Luo,et al. PVT v2: Improved baselines with Pyramid Vision Transformer , 2021, Computational Visual Media.
[11] Yann LeCun,et al. MDETR - Modulated Detection for End-to-End Multi-Modal Understanding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[12] Andrea Vedaldi,et al. Localizing Visual Sounds the Hard Way , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Hung-Yu Tseng,et al. Unsupervised Sound Localization via Iterative Contrastive Learning , 2021, Comput. Vis. Image Underst..
[14] Parham Aarabi,et al. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Weiyao Lin,et al. Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching , 2020, NeurIPS.
[16] Bin Li,et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.
[17] B. Leibe,et al. Making a Case for 3D Convolutions for Object Segmentation in Videos , 2020, BMVC.
[18] Andrew Owens,et al. Self-Supervised Learning of Audio-Visual Objects from Video , 2020, ECCV.
[19] Weiyao Lin,et al. Multiple Sound Sources Localization from Coarse to Fine , 2020, ECCV.
[20] Tao Kong,et al. SOLOv2: Dynamic and Fast Instance Segmentation , 2020, NeurIPS.
[21] Hao Chen,et al. Conditional Convolutions for Instance Segmentation , 2020, ECCV.
[22] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[23] Xuelong Li,et al. Deep Multimodal Clustering for Unsupervised Audiovisual Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[25] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[26] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[27] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[28] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[29] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Serge J. Belongie,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Seyed-Ahmad Ahmadi,et al. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).
[33] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[35] Yuchao Dai,et al. Transformer Transforms Salient Object Detection and Camouflaged Object Detection , 2021, ArXiv.
[36] M. Pantic,et al. Active Speaker Detection and Localization in Videos Using Low-Rank and Kernelized Sparsity , 2020, IEEE Signal Processing Letters.