暂无分享,去创建一个
Joon Son Chung | Soyeon Choe | Soo-Whan Chung | Hong-Goo Kang | Soyeon Choe | Hong-Goo Kang | Soo-Whan Chung
[1] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Andrew Zisserman,et al. Learnable PINs: Cross-Modal Embeddings for Person Identity , 2018, ECCV.
[4] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.
[5] Nima Mesgarani,et al. TaSNet: Time-Domain Audio Separation Network for Real-Time, Single-Channel Speech Separation , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Tae-Hyun Oh,et al. Speech2Face: Learning the Face Behind a Voice , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Joon Son Chung,et al. My lips are concealed: Audio-visual speech enhancement through obstructions , 2019, INTERSPEECH.
[8] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[9] N. Mesgarani,et al. Selective cortical representation of attended speaker in multi-talker speech perception , 2012, Nature.
[10] Tomohiro Nakatani,et al. Multimodal SpeakerBeam: Single Channel Target Speech Extraction with Audio-Visual Speaker Clues , 2019, INTERSPEECH.
[11] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[12] Andrew Zisserman,et al. Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[13] Joon Son Chung,et al. The Sound of My Voice: Speaker Representation Loss for Target Voice Separation , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Tae-Hyun Oh,et al. On Learning Associations of Faces and Voices , 2018, ACCV.
[15] Joon Son Chung,et al. Learning to lip read words by watching videos , 2018, Comput. Vis. Image Underst..
[16] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[18] Bowen Zhou,et al. A Structured Self-attentive Sentence Embedding , 2017, ICLR.
[19] Joon Son Chung,et al. Disentangled Speech Embeddings Using Cross-Modal Self-Supervision , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[21] Dong Yu,et al. Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[22] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Nima Mesgarani,et al. Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[24] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[25] Shmuel Peleg,et al. Visual Speech Enhancement , 2017, INTERSPEECH.
[26] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.
[27] Tomohiro Nakatani,et al. SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures , 2019, IEEE Journal of Selected Topics in Signal Processing.
[28] Joon Son Chung,et al. Perfect Match: Self-Supervised Embeddings for Cross-Modal Retrieval , 2020, IEEE Journal of Selected Topics in Signal Processing.
[29] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Chng Eng Siong,et al. Time-Domain Speaker Extraction Network , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[31] Zhiyao Duan,et al. Audio–Visual Deep Clustering for Speech Separation , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[32] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Jesper Jensen,et al. Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[35] Nima Mesgarani,et al. Speaker-Independent Speech Separation With Deep Attractor Network , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[36] John R. Hershey,et al. VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking , 2018, INTERSPEECH.