Audio–Visual Deep Clustering for Speech Separation
暂无分享,去创建一个
Zhiyao Duan | Changshui Zhang | Rui Lu | Changshui Zhang | Rui Lu | Z. Duan
[1] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[2] Jonathan Le Roux,et al. Single-Channel Multi-Speaker Separation Using Deep Clustering , 2016, INTERSPEECH.
[3] Zhong-Qiu Wang,et al. End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction , 2018, INTERSPEECH.
[4] Amirsina Torfi,et al. 3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition , 2017, IEEE Access.
[5] Shmuel Peleg,et al. Seeing Through Noise: Speaker Separation and Enhancement using Visually-derived Speech , 2017, ArXiv.
[6] Jesper Jensen,et al. Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Guy J. Brown,et al. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .
[8] John R. Hershey,et al. Super-human multi-talker speech recognition: A graphical modeling approach , 2010, Comput. Speech Lang..
[9] Christian Jutten,et al. Visual voice activity detection as a help for speech source separation from convolutive mixtures , 2007, Speech Commun..
[10] Patrick Pérez,et al. Motion informed audio source separation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Changshui Zhang,et al. Listen and Look: Audio–Visual Matching Assisted Speech Source Separation , 2018, IEEE Signal Processing Letters.
[12] Saeid Sanei,et al. Video assisted speech source separation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[13] Dalia El Badawy,et al. On-the-fly audio source separation , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).
[14] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[15] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[16] Dong Yu,et al. Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[17] DeLiang Wang,et al. On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[18] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[19] Zhong-Qiu Wang,et al. Alternative Objective Functions for Deep Clustering , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.
[21] Michael I. Jordan,et al. Factorial Hidden Markov Models , 1995, Machine Learning.
[22] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..
[23] Nima Mesgarani,et al. Speaker-Independent Speech Separation With Deep Attractor Network , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[24] Bhiksha Raj,et al. Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.
[25] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Pierre Vandergheynst,et al. Blind Audiovisual Source Separation Based on Sparse Redundant Representations , 2010, IEEE Transactions on Multimedia.
[27] Jonathon A. Chambers,et al. Audiovisual Speech Source Separation: An overview of key methodologies , 2014, IEEE Signal Processing Magazine.
[28] Ce Liu,et al. Exploring new representations and applications for motion analysis , 2009 .
[29] Yu Tsao,et al. Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks , 2017, IEEE Transactions on Emerging Topics in Computational Intelligence.
[30] Tuomas Virtanen,et al. Speech recognition using factorial hidden Markov models for separation in the feature space , 2006, INTERSPEECH.
[31] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[32] Gregory B. Cogan,et al. Visual Input Enhances Selective Speech Envelope Tracking in Auditory Cortex at a “Cocktail Party” , 2013, The Journal of Neuroscience.
[33] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Wei Ji Ma,et al. Lip-Reading Aids Word Recognition Most in Moderate Noise: A Bayesian Explanation Using High-Dimensional Feature Space , 2009, PloS one.
[35] Christian Jutten,et al. Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli , 2002, EURASIP J. Adv. Signal Process..
[36] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Nima Mesgarani,et al. Deep attractor network for single-microphone speaker separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Josef Kittler,et al. Source Separation of Convolutive and Noisy Mixtures Using Audio-Visual Dictionary Learning and Probabilistic Time-Frequency Masking , 2013, IEEE Transactions on Signal Processing.
[39] Paris Smaragdis,et al. Deep learning for monaural speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Philip J. B. Jackson,et al. Use of bimodal coherence to resolve the permutation problem in convolutive BSS , 2012, Signal Process..
[41] DeLiang Wang,et al. An Unsupervised Approach to Cochannel Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[42] Naomi Harte,et al. TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech , 2015, IEEE Transactions on Multimedia.
[43] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[44] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[45] Richard G. Baraniuk,et al. k-POD: A Method for k-Means Clustering of Missing Data , 2014, 1411.7013.
[46] Paris Smaragdis,et al. Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[47] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[48] Christian Jutten,et al. Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[49] Shmuel Peleg,et al. Visual Speech Enhancement , 2017, INTERSPEECH.
[50] Olivier Cappé,et al. Soft Nonnegative Matrix Co-Factorization , 2014, IEEE Transactions on Signal Processing.
[51] Andrew Zisserman,et al. Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Trevor Darrell,et al. Learning Features by Watching Objects Move , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[54] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[55] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[56] Patrick Pérez,et al. Identify, Locate and Separate: Audio-Visual Object Extraction in Large Video Collections Using Weak Supervision , 2018, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
[57] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[58] Jun Du,et al. An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.
[59] Paris Smaragdis,et al. Convolutive Speech Bases and Their Application to Supervised Speech Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.