暂无分享,去创建一个
Joon Son Chung | Andrew Zisserman | Jaesung Huh | Triantafyllos Afouras | Arsha Nagrani | Andrew Zisserman | Triantafyllos Afouras | Arsha Nagrani | Jaesung Huh
[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[2] H. Edelsbrunner,et al. Efficient algorithms for agglomerative hierarchical clustering methods , 1984 .
[3] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Jason W. Pelecanos,et al. Online speaker diarization using adapted i-vector transforms , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Daniel C. Burnett,et al. WebRTC: APIs and RTCWEB Protocols of the HTML5 Real-Time Web , 2012 .
[6] Joon Son Chung,et al. In defence of metric learning for speaker recognition , 2020, INTERSPEECH.
[7] Jean Carletta,et al. The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.
[8] Daniel Garcia-Romero,et al. Speaker diarization with plda i-vector scoring and unsupervised calibration , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).
[9] Shinji Watanabe,et al. Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge , 2018, INTERSPEECH.
[10] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Hervé Bourlard,et al. Improved overlap speech diarization of meeting recordings using long-term conversational features , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[12] Abhishek Dutta,et al. The VIA Annotation Software for Images, Audio and Video , 2019, ACM Multimedia.
[13] Joon Son Chung,et al. Voxceleb: Large-scale speaker verification in the wild , 2020, Comput. Speech Lang..
[14] Joon Son Chung,et al. Utterance-level Aggregation for Speaker Recognition in the Wild , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Quan Wang,et al. Fully Supervised Speaker Diarization , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.
[17] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[18] Kenneth Ward Church,et al. The Second DIHARD Diarization Challenge: Dataset, task, and baselines , 2019, INTERSPEECH.
[19] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[20] Jun Du,et al. Speaker Diarization with Enhancing Speech for the First DIHARD Challenge , 2018, INTERSPEECH.
[21] Shifeng Zhang,et al. S^3FD: Single Shot Scale-Invariant Face Detector , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[22] Alan McCree,et al. Speaker diarization using deep neural network embeddings , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Andreas Stolcke,et al. The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[25] Joon Son Chung,et al. My lips are concealed: Audio-visual speech enhancement through obstructions , 2019, INTERSPEECH.
[26] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[27] James H. Martin,et al. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .
[28] Joon Son Chung,et al. Perfect Match: Improved Cross-modal Embeddings for Audio-visual Synchronisation , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[30] Joon Son Chung,et al. Lip Reading in Profile , 2017, BMVC.
[31] Abhishek Dutta,et al. The VGG Image Annotator (VIA) , 2019, ArXiv.
[32] Douglas A. Reynolds,et al. Speaker diarisation for broadcast news , 2004, Odyssey.
[33] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[34] Joon Son Chung,et al. Who said that?: Audio-visual speaker diarisation of real-world meetings , 2019, INTERSPEECH.
[35] Andrew Zisserman,et al. Vggsound: A Large-Scale Audio-Visual Dataset , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Sergey Ioffe,et al. Probabilistic Linear Discriminant Analysis , 2006, ECCV.
[37] Themos Stafylakis,et al. PLDA for speaker verification with utterances of arbitrary duration , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.