Multi-Modal Multi-Channel Target Speech Separation
暂无分享,去创建一个
Dong Yu | Yong Xu | Shi-Xiong Zhang | Yuexian Zou | Lianwu Chen | Rongzhi Gu | Dong Yu | Rongzhi Gu | Shi-Xiong Zhang | Yong Xu | Yuexian Zou | Lianwu Chen
[1] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[2] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Lei Xie,et al. Time Domain Audio Visual Speech Separation , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[4] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[5] Shmuel Peleg,et al. Seeing Through Noise: Visually Driven Speaker Separation And Enhancement , 2017, ICASSP.
[6] Joon Son Chung,et al. My lips are concealed: Audio-visual speech enhancement through obstructions , 2019, INTERSPEECH.
[7] Colin Cherry,et al. Contribution to a Study of the “Cocktail Party Problem” , 1960 .
[8] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[10] Xiong Xiao,et al. Efficient Integration of Fixed Beamformers and Speech Separation Networks for Multi-Channel Far-Field Speech Separation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Tomohiro Nakatani,et al. Context adaptive deep neural networks for fast acoustic model adaptation , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[13] E. Lehmann,et al. Prediction of energy decay in room impulse responses simulated with an image-source model. , 2008, The Journal of the Acoustical Society of America.
[14] Soumitro Chakrabarty,et al. Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals , 2018, IEEE Journal of Selected Topics in Signal Processing.
[15] Tomohiro Nakatani,et al. Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures , 2017, INTERSPEECH.
[16] Tomohiro Nakatani,et al. Learning speaker representation for neural network based multichannel speaker extraction , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[17] John H. L. Hansen,et al. Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Erik Cambria,et al. Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling , 2018, Knowl. Based Syst..
[20] John R. Hershey,et al. VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking , 2018, INTERSPEECH.
[21] Jonathan Le Roux,et al. SDR – Half-baked or Well Done? , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Zhiyao Duan,et al. Audio–Visual Deep Clustering for Speech Separation , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[23] Yong Xu,et al. A comprehensive study of speech separation: spectrogram vs waveform separation , 2019, INTERSPEECH.
[24] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[25] Dong Yu,et al. FACTORIZED DEEP NEURAL NETWORKS FOR ADAPTIVE SPEECH RECOGNITION , 2012 .
[26] Tomohiro Nakatani,et al. SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures , 2019, IEEE Journal of Selected Topics in Signal Processing.
[27] Olivier Galibert,et al. Methodologies for the evaluation of speaker diarization and automatic speech recognition in the presence of overlapping speech , 2013, INTERSPEECH.
[28] Zhong-Qiu Wang,et al. Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Jun Du,et al. Speech separation of a target speaker based on deep neural networks , 2014, 2014 12th International Conference on Signal Processing (ICSP).
[30] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[31] Yifan Gong,et al. End-to-End attention based text-dependent speaker verification , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).
[32] Zhuo Chen,et al. Single-channel Speech Extraction Using Speaker Inventory and Attention Network , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] DeLiang Wang,et al. Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[34] Nima Mesgarani,et al. TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation. , 2018 .
[35] Jun Wang,et al. Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures , 2018, INTERSPEECH.
[36] Jesper Jensen,et al. Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Xiong Xiao,et al. Multi-Channel Overlapped Speech Recognition with Location Guided Speech Extraction Network , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[38] DeLiang Wang,et al. On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[39] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Nima Mesgarani,et al. Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[41] Dong Yu,et al. Multi-band PIT and Model Integration for Improved Multi-channel Speech Separation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Jonathan Le Roux,et al. Phase Reconstruction with Learned Time-Frequency Representations for Single-Channel Speech Separation , 2018, 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC).
[43] Deliang Wang,et al. On Spatial Features for Supervised Speech Separation and its Application to Beamforming and Robust ASR , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[44] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[45] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..
[46] Dong Yu,et al. Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information , 2019, INTERSPEECH.
[47] DeLiang Wang,et al. Combining Spectral and Spatial Features for Deep Learning Based Blind Speaker Separation , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[48] Shmuel Peleg,et al. Improved Speech Reconstruction from Silent Video , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).
[49] Zhong-Qiu Wang,et al. End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction , 2018, INTERSPEECH.
[50] Zhong-Qiu Wang,et al. Alternative Objective Functions for Deep Clustering , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).