Multimodal Attention Fusion for Target Speaker Extraction
暂无分享,去创建一个
Tomohiro Nakatani | Shoko Araki | Marc Delcroix | Keisuke Kinoshita | Hiroshi Sato | Tsubasa Ochiai | T. Nakatani | K. Kinoshita | S. Araki | Marc Delcroix | Tsubasa Ochiai | Hiroshi Sato
[1] Tomohiro Nakatani,et al. Multimodal SpeakerBeam: Single Channel Target Speech Extraction with Audio-Visual Speaker Clues , 2019, INTERSPEECH.
[2] Nima Mesgarani,et al. Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[3] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[4] Tomohiro Nakatani,et al. Single Channel Target Speaker Extraction and Recognition with Speaker Beam , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[6] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Tomohiro Nakatani,et al. Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Tomohiro Nakatani,et al. Compact Network for Speakerbeam Target Speaker Extraction , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Joon Son Chung,et al. My lips are concealed: Audio-visual speech enhancement through obstructions , 2019, INTERSPEECH.
[11] Jun Wang,et al. Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures , 2018, INTERSPEECH.
[12] Dong Yu,et al. Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[13] Dong Yu,et al. Multi-Modal Multi-Channel Target Speech Separation , 2020, IEEE Journal of Selected Topics in Signal Processing.
[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[15] Lei Xie,et al. Time Domain Audio Visual Speech Separation , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[16] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[17] Nima Mesgarani,et al. TaSNet: Time-Domain Audio Separation Network for Real-Time, Single-Channel Speech Separation , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Tomohiro Nakatani,et al. Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures , 2017, INTERSPEECH.
[19] Shmuel Peleg,et al. Visual Speech Enhancement , 2017, INTERSPEECH.
[20] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.