Mix and Localize: Localizing Sound Sources in Mixtures
暂无分享,去创建一个
[1] Alexei A. Efros,et al. Learning Pixel Trajectories with Multiscale Contrastive Random Walks , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Andrew Owens,et al. Structure from Silence: Learning Scene Structure from Ambient Sound , 2021, CoRL.
[3] K. K. Rachavarapu,et al. Localize to Binauralize: Audio Spatialization from Visual Sound Source Localization , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[4] Anoop Cherian,et al. Visual Scene Graphs for Audio Source Separation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[5] Kristen Grauman,et al. Move2Hear: Active Audio-Visual Source Separation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[6] Andrea Vedaldi,et al. Localizing Visual Sounds the Hard Way , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Yapeng Tian,et al. Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Abhinav Valada,et al. There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Kristen Grauman,et al. VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Weiyao Lin,et al. Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching , 2020, NeurIPS.
[11] Andrew Owens,et al. Self-Supervised Learning of Audio-Visual Objects from Video , 2020, ECCV.
[12] Weiyao Lin,et al. Multiple Sound Sources Localization from Coarse to Fine , 2020, ECCV.
[13] Alexei A. Efros,et al. Space-Time Correspondence as a Contrastive Random Walk , 2020, NeurIPS.
[14] Andrea Vedaldi,et al. Labelling unlabelled videos from scratch with multi-modal self-supervision , 2020, NeurIPS.
[15] Ron J. Weiss,et al. Unsupervised Sound Separation Using Mixture Invariant Training , 2020, NeurIPS.
[16] Justin Salamon,et al. Telling Left From Right: Learning Spatial Correspondence of Sight and Sound , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Kristen Grauman,et al. VisualEchoes: Spatial Image Representation Learning through Echolocation , 2020, ECCV.
[18] Andrew Zisserman,et al. Vggsound: A Large-Scale Audio-Visual Dataset , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Yong Jae Lee,et al. Audiovisual SlowFast Networks for Video Recognition , 2020, ArXiv.
[20] Seong Joon Oh,et al. Evaluating Weakly Supervised Object Localization Methods Right , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Chuang Gan,et al. Self-supervised Audio-visual Co-segmentation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Chuang Gan,et al. The Sound of Motions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[23] Allan Jabri,et al. Learning Correspondence From the Cycle-Consistency of Time , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Xuelong Li,et al. Deep Multimodal Clustering for Unsupervised Audiovisual Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] James R. Glass,et al. Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input , 2018, International Journal of Computer Vision.
[26] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[27] Sergio Guadarrama,et al. Tracking Emerges by Colorizing Videos , 2018, ECCV.
[28] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[29] Lorenzo Torresani,et al. Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization , 2018, NeurIPS.
[30] Stella X. Yu,et al. Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[31] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[32] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[33] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[34] Andrew Zisserman,et al. Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[35] Chenliang Xu,et al. Audio-Visual Event Localization in Unconstrained Videos , 2018, ECCV.
[36] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[37] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[38] Tillman Weyde,et al. Singing Voice Separation with Deep U-Net Convolutional Networks , 2017, ISMIR.
[39] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[40] Jesper Jensen,et al. Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[42] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[43] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[44] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[48] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[49] Andrzej Cichocki,et al. Nonnegative Matrix and Tensor Factorization T , 2007 .
[50] Sam T. Roweis,et al. One Microphone Source Separation , 2000, NIPS.
[51] H S Colburn,et al. Speech intelligibility and localization in a multi-source environment. , 1999, The Journal of the Acoustical Society of America.
[52] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.