暂无分享,去创建一个
Anoop Cherian | Narendra Ahuja | Moitreya Chatterjee | Jonathan Le Roux | N. Ahuja | A. Cherian | Moitreya Chatterjee
[1] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[2] Daniel P. W. Ellis,et al. MIR_EVAL: A Transparent Implementation of Common MIR Metrics , 2014, ISMIR.
[3] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[4] Emmanuel Vincent,et al. Audio Source Separation and Speech Enhancement , 2018 .
[5] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[6] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[7] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[8] DeLiang Wang,et al. Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[9] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[10] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Hao Li,et al. Using optimal ratio mask as training target for supervised speech separation , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[12] Jan-Michael Frahm,et al. Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, International Journal of Computer Vision.
[13] Jonathan Le Roux,et al. Discriminative NMF and its application to single-channel source separation , 2014, INTERSPEECH.
[14] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[15] Philipos C. Loizou,et al. Speech Enhancement: Theory and Practice , 2007 .
[16] D. Wang,et al. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006, IEEE Trans. Neural Networks.
[17] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[18] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[19] Rogério Schmidt Feris,et al. Learning to Separate Object Sounds by Watching Unlabeled Video , 2018, ECCV.
[20] Jonathan Le Roux,et al. Finding Strength in Weakness: Learning to Separate Sounds With Weak Supervision , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[21] Björn W. Schuller,et al. Discriminatively trained recurrent neural networks for single-channel speech separation , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).
[22] Yue Wang,et al. Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..
[23] Jordi Pont-Tuset,et al. The Open Images Dataset V4 , 2018, International Journal of Computer Vision.
[24] Kristen Grauman,et al. Co-Separating Sounds of Visual Objects , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[25] Jaewoo Kang,et al. Self-Attention Graph Pooling , 2019, ICML.
[26] Michael Elad,et al. Pixels that sound , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[27] Juan Carlos Niebles,et al. Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Chen Fang,et al. Visual to Sound: Generating Natural Sound for Videos in the Wild , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[29] DeLiang Wang,et al. On the optimality of ideal binary time-frequency masks , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[30] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[31] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[32] Paris Smaragdis,et al. Static and Dynamic Source Separation Using Nonnegative Factorizations: A unified view , 2014, IEEE Signal Processing Magazine.
[33] Gabriel Meseguer-Brocal,et al. Conditioned-U-Net: Introducing a Control Mechanism in the U-Net for Multiple Source Separations , 2019, ISMIR.
[34] Paris Smaragdis,et al. Deep learning for monaural speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Andrew Owens,et al. Self-Supervised Learning of Audio-Visual Objects from Video , 2020, ECCV.
[36] Jonathan Le Roux,et al. Single-Channel Multi-Speaker Separation Using Deep Clustering , 2016, INTERSPEECH.
[37] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Jesper Jensen,et al. Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Emmanuel Vincent,et al. A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[40] Efthymios Tzinis,et al. Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds , 2020, ICLR.
[41] Jun Du,et al. An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.
[42] Nuno Vasconcelos,et al. Self-Supervised Generation of Spatial Audio for 360 Video , 2018, NIPS 2018.
[43] DeLiang Wang,et al. Divide and Conquer: A Deep CASA Approach to Talker-Independent Monaural Speaker Separation , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[44] Peng Gao,et al. Spatio-Temporal Scene Graphs for Video Dialog , 2020, ArXiv.
[45] Chuang Gan,et al. The Sound of Motions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[46] Gunhee Kim,et al. AudioCaps: Generating Captions for Audios in The Wild , 2019, NAACL.
[47] Tillman Weyde,et al. Singing Voice Separation with Deep U-Net Convolutional Networks , 2017, ISMIR.
[48] Pietro Liò,et al. Graph Attention Networks , 2017, ICLR.
[49] Javier R. Movellan,et al. Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.
[50] Dahua Lin,et al. Recursive Visual Sound Separation Using Minus-Plus Net , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[51] Michael S. Bernstein,et al. Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Emilia Gómez,et al. End-to-end Sound Source Separation Conditioned on Instrument Labels , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[53] Chuang Gan,et al. Music Gesture for Visual Sound Separation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[55] Efthymios Tzinis,et al. Unsupervised Sound Separation Using Mixtures of Mixtures , 2020, ArXiv.
[56] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[57] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[58] Z. Tan,et al. An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[59] Pierre Comon,et al. Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .