Exploiting Visual Context Semantics for Sound Source Localization
暂无分享,去创建一个
[1] Hung-Yu Tseng,et al. Unsupervised Sound Localization via Iterative Contrastive Learning , 2021, Comput. Vis. Image Underst..
[2] Yapeng Tian,et al. Learning in Audio-visual Context: A Review, Analysis, and New Perspective , 2022, ArXiv.
[3] Xavier Alameda-Pineda,et al. A Proposal-based Paradigm for Self-supervised Sound Source Localization in Videos , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] T. Tan,et al. Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes , 2022, ArXiv.
[5] Junsik Kim,et al. Learning Sound Localization Better from Semantically Similar Samples , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Junsik Kim,et al. Less Can Be More: Sound Source Localization With a Classification Model , 2022, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
[7] Chao Ma,et al. Unsupervised Sounding Object Localization with Bottom-Up and Top-Down Attention , 2022, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
[8] Yuki M. Asano,et al. Self-supervised object detection from audio-visual correspondence , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Leonid Sigal,et al. TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation , 2021, ArXiv.
[10] Robert Harb,et al. InfoSeg: Unsupervised Semantic Image Segmentation with Mutual Information Maximization , 2021, GCPR.
[11] Anoop Cherian,et al. Visual Scene Graphs for Audio Source Separation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[12] Andrea Vedaldi,et al. Localizing Visual Sounds the Hard Way , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Wouter Van Gansbeke,et al. Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[14] Kristen Grauman,et al. VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Rui Wang,et al. Deep Audio-visual Learning: A Survey , 2020, International Journal of Automation and Computing.
[16] Weiyao Lin,et al. Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching , 2020, NeurIPS.
[17] Andrew Owens,et al. Self-Supervised Learning of Audio-Visual Objects from Video , 2020, ECCV.
[18] Weiyao Lin,et al. Multiple Sound Sources Localization from Coarse to Fine , 2020, ECCV.
[19] Andrew Zisserman,et al. Vggsound: A Large-Scale Audio-Visual Dataset , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Chuang Gan,et al. Music Gesture for Visual Sound Separation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Bernard Ghanem,et al. Self-Supervised Learning by Cross-Modal Audio-Video Clustering , 2019, NeurIPS.
[22] Mengmi Zhang,et al. Putting Visual Object Recognition in Context , 2019, Computer Vision and Pattern Recognition.
[23] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[24] Jitendra Malik,et al. Learning Individual Styles of Conversational Gesture , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Tae-Hyun Oh,et al. Speech2Face: Learning the Face Behind a Voice , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Kristen Grauman,et al. Co-Separating Sounds of Visual Objects , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[27] Chuang Gan,et al. The Sound of Motions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[28] Shuicheng Yan,et al. Graph-Based Global Reasoning Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Xuelong Li,et al. Deep Multimodal Clustering for Unsupervised Audiovisual Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Lorenzo Torresani,et al. Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization , 2018, NeurIPS.
[31] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[32] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[33] Rogério Schmidt Feris,et al. Learning to Separate Object Sounds by Watching Unlabeled Video , 2018, ECCV.
[34] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[35] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[36] Chen Fang,et al. Visual to Sound: Generating Natural Sound for Videos in the Wild , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[37] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[38] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[39] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[41] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[42] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.
[43] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[44] L. Shams,et al. Crossmodal influences on visual perception. , 2010, Physics of life reviews.
[45] C. Spence,et al. Multisensory Integration: Space, Time and Superadditivity , 2005, Current Biology.
[46] David Poeppel,et al. Visual speech speeds up the neural processing of auditory speech. , 2005, Proceedings of the National Academy of Sciences of the United States of America.
[47] N. Bolognini,et al. Enhancement of visual perception by crossmodal visuo-auditory interaction , 2002, Experimental Brain Research.