Audio-Visual Segmentation with Semantics
暂无分享,去创建一个
Stan Birchfield | Lingpeng Kong | Jing Zhang | Xuyang Shen | Dan Guo | Meng Wang | Jiayi Zhang | Yiran Zhong | Weixuan Sun | Jianyuan Wang | Jinxing Zhou | Jing Zhang
[1] Zhengjun Zha,et al. Semantic and Relation Modulation for Audio-Visual Event Localization , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[2] Fumin Shen,et al. DHHN: Dual Hierarchical Hybrid Network for Weakly-Supervised Audio-Visual Video Parsing , 2022, ACM Multimedia.
[3] Yapeng Tian,et al. Learning in Audio-visual Context: A Review, Analysis, and New Perspective , 2022, ArXiv.
[4] Stan Birchfield,et al. Audio-Visual Segmentation , 2022, ECCV.
[5] Chen Qian,et al. Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing , 2022, ECCV.
[6] Jiannan Wu,et al. Language as Queries for Referring Video Object Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Evgenii Zheltonozhskii,et al. End-to-End Referring Video Object Segmentation with Multimodal Transformers , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Yuejie Zhang,et al. MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing , 2021, ACM Multimedia.
[9] P. Luo,et al. PVT v2: Improved baselines with Pyramid Vision Transformer , 2021, Computational Visual Media.
[10] Nick Barnes,et al. Learning Generative Vision Transformer with Energy-Based Latent Space for Saliency Prediction , 2021, NeurIPS.
[11] Yi Yang,et al. Associating Objects with Transformers for Video Object Segmentation , 2021, NeurIPS.
[12] Yu Wu,et al. Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Anima Anandkumar,et al. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers , 2021, NeurIPS.
[14] Andrea Vedaldi,et al. Localizing Visual Sounds the Hard Way , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Shijie Hao,et al. Positive Sample Propagation along the Audio-Visual Event Line , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Parham Aarabi,et al. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Ming-Hsuan Yang,et al. Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing , 2021, NeurIPS.
[18] Yuchao Dai,et al. Transformer Transforms Salient Object Detection and Camouflaged Object Detection , 2021, ArXiv.
[19] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[20] Runhao Zeng,et al. Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization , 2020, ACM Multimedia.
[21] Weiyao Lin,et al. Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching , 2020, NeurIPS.
[22] Laura Leal-Taixé,et al. Making a Case for 3D Convolutions for Object Segmentation in Videos , 2020, BMVC.
[23] Radomír Mech,et al. Unsupervised Video Object Segmentation with Joint Hotspot Tracking , 2020, ECCV.
[24] Ruize Wang,et al. Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning , 2020, ACM Multimedia.
[25] Andrew Owens,et al. Self-Supervised Learning of Audio-Visual Objects from Video , 2020, ECCV.
[26] Chenliang Xu,et al. Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing , 2020, ECCV.
[27] Weiyao Lin,et al. Multiple Sound Sources Localization from Coarse to Fine , 2020, ECCV.
[28] Janani Ramaswamy,et al. What Makes the Sound?: A Dual-Modality Interacting Network for Audio-Visual Event Localization , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Andrew Zisserman,et al. Vggsound: A Large-Scale Audio-Visual Dataset , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Hao Shen,et al. CenterMask: Single Shot Instance Segmentation With Point Representation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Sukhendu Das,et al. See the Sound, Hear the Pixels , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[32] Yu-Chiang Frank Wang,et al. Audiovisual Transformer with Instance Attention for Audio-Visual Event Localization , 2020, ACCV.
[33] Bohyung Han,et al. URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark , 2020, ECCV.
[34] Yan Yan,et al. Dual Attention Matching for Audio-Visual Event Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[35] Chuang Gan,et al. Self-supervised Audio-visual Co-segmentation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Kristen Grauman,et al. Co-Separating Sounds of Visual Objects , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[37] Chuang Gan,et al. The Sound of Motions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[38] Miriam Bellver,et al. RVOS: End-To-End Recurrent Network for Video Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Yu-Chiang Frank Wang,et al. Dual-modality Seq2Seq Network for Audio-visual Event Localization , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Kaiming He,et al. Panoptic Feature Pyramid Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Xuelong Li,et al. Deep Multimodal Clustering for Unsupervised Audiovisual Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Sanyuan Zhao,et al. Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection , 2018, ECCV.
[43] Hongdong Li,et al. 3D Geometry-Aware Semantic Labeling of Outdoor Street Scenes , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).
[44] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[45] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[46] Luc Van Gool,et al. Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[47] Rogério Schmidt Feris,et al. Learning to Separate Object Sounds by Watching Unlabeled Video , 2018, ECCV.
[48] Chenliang Xu,et al. Audio-Visual Event Localization in Unconstrained Videos , 2018, ECCV.
[49] Bernt Schiele,et al. Video Object Segmentation with Language Referring Expressions , 2018, ACCV.
[50] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[51] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[52] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[53] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[54] Alexander G. Schwing,et al. MaskRNN: Instance Level Video Object Segmentation , 2018, NIPS.
[55] Jan Kautz,et al. Learning to Segment Instances in Videos with Spatial Propagation Network , 2017, ArXiv.
[56] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[57] Cordelia Schmid,et al. SfM-Net: Learning of Structure and Motion from Video , 2017, ArXiv.
[58] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[59] Karteek Alahari,et al. Learning Motion Patterns in Videos , 2016, CVPR.
[60] Luc Van Gool,et al. One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[61] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[62] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[63] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[64] Luc Van Gool,et al. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[65] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[66] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[67] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[68] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[69] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[70] Michal Irani,et al. Video Segmentation by Non-Local Consensus voting , 2014, BMVC.
[71] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .