End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection
暂无分享,去创建一个
Shinji Watanabe | Shota Horiguchi | Kenji Nagamatsu | Yuki Takashima | Leibny Paola García-Perera | Yusuke Fujita | Paola García | Shinji Watanabe | Yuki Takashima | Shota Horiguchi | Kenji Nagamatsu | Yusuke Fujita
[1] Petr Fousek,et al. Developing On-Line Speaker Diarization System , 2017, INTERSPEECH.
[2] Cheung-Chi Leung,et al. Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Philip C. Woodland,et al. Discriminative Neural Clustering for Speaker Diarisation , 2019, ArXiv.
[4] Quan Wang,et al. Generalized End-to-End Loss for Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Delbert Dueck,et al. Clustering by Passing Messages Between Data Points , 2007, Science.
[6] Reinhold Häb-Umbach,et al. An Investigation into the Effectiveness of Enhancement in ASR Training and Test for Chime-5 Dinner Party Transcription , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[7] Shinji Watanabe,et al. Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals , 2020, NeurIPS.
[8] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.
[9] Douglas A. Reynolds,et al. An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[10] Naoyuki Kanda,et al. End-to-End Neural Speaker Diarization with Self-Attention , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[11] Thomas Wolf,et al. A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks , 2018, AAAI.
[12] Nicholas W. D. Evans,et al. Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[13] M. Bar. A Cortical Mechanism for Triggering Top-Down Facilitation in Visual Object Recognition , 2003, Journal of Cognitive Neuroscience.
[14] Kenneth Ward Church,et al. The Second DIHARD Diarization Challenge: Dataset, task, and baselines , 2019, INTERSPEECH.
[15] Ming Li,et al. LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization , 2019, INTERSPEECH.
[16] Qiang Yang,et al. An Overview of Multi-task Learning , 2018 .
[17] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Naoyuki Kanda,et al. Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition , 2019, INTERSPEECH.
[19] Yu Zhang,et al. Learning to Multitask , 2018, NeurIPS.
[20] Alan McCree,et al. Speaker Diarization Using Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings , 2019, INTERSPEECH.
[21] Shinji Watanabe,et al. Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge , 2018, INTERSPEECH.
[22] Naoyuki Kanda,et al. End-to-End Neural Speaker Diarization with Permutation-Free Objectives , 2019, INTERSPEECH.
[23] Daniel Garcia-Romero,et al. Speaker diarization with plda i-vector scoring and unsupervised calibration , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).
[24] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[25] Quan Wang,et al. Fully Supervised Speaker Diarization , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Shinji Watanabe,et al. Online End-To-End Neural Diarization with Speaker-Tracing Buffer , 2021, 2021 IEEE Spoken Language Technology Workshop (SLT).
[27] Quan Wang,et al. Speaker Diarization with LSTM , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Jon Barker,et al. CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings , 2020, 6th International Workshop on Speech Processing in Everyday Environments (CHiME 2020).
[29] Yan Zhao,et al. A Joint Multi-Task Learning Framework for Spoken Language Understanding , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Naoyuki Kanda,et al. Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR , 2019, INTERSPEECH.
[31] Shinji Watanabe,et al. Neural Speaker Diarization with Speaker-Wise Chain Rule , 2020, ArXiv.
[32] Jesper Jensen,et al. Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Aleksei Romanenko,et al. Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario , 2020, INTERSPEECH.
[34] Jianping Fan,et al. HD-MTL: Hierarchical Deep Multi-Task Learning for Large-Scale Visual Recognition , 2017, IEEE Transactions on Image Processing.
[35] Jon Barker,et al. The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines , 2018, INTERSPEECH.