DiaCorrect: End-to-end error correction for speaker diarization
暂无分享,去创建一个
[1] Ming Li,et al. Incorporating End-to-End Framework Into Target-Speaker Voice Activity Detection , 2022, IEEE International Conference on Acoustics, Speech, and Signal Processing.
[2] Y. Qian,et al. Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Ming Li,et al. Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for the M2met Challenge , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] L. Burget,et al. DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation and Extraction , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] H. Kim,et al. Auxiliary Loss of Transformer with Residual Connection for End-to-End Speaker Diarization , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Kyu J. Han,et al. A Review of Speaker Diarization: Recent Advances with Deep Learning , 2021, Comput. Speech Lang..
[7] Jiangyan Yi,et al. End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition , 2021, Interspeech.
[8] Jun Du,et al. Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker , 2021, Interspeech.
[9] Tie-Yan Liu,et al. FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition , 2021, NeurIPS.
[10] Kenneth Ward Church,et al. The Third DIHARD Diarization Challenge , 2020, Interspeech.
[11] Luk'avs Burget,et al. Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks , 2020, Comput. Speech Lang..
[12] Shinji Watanabe,et al. End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors , 2020, INTERSPEECH.
[13] Aleksei Romanenko,et al. Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario , 2020, INTERSPEECH.
[14] Jon Barker,et al. CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings , 2020, 6th International Workshop on Speech Processing in Everyday Environments (CHiME 2020).
[15] Shuai Wang,et al. But System for the Second Dihard Speech Diarization Challenge , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Shinji Watanabe,et al. End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification , 2020, ArXiv.
[17] Shiliang Zhang,et al. Investigation of Transformer Based Spelling Correction Model for CTC-Based End-to-End Mandarin Speech Recognition , 2019, INTERSPEECH.
[18] Naoyuki Kanda,et al. End-to-End Neural Speaker Diarization with Self-Attention , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[19] Naoyuki Kanda,et al. End-to-End Neural Speaker Diarization with Permutation-Free Objectives , 2019, INTERSPEECH.
[20] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Vladlen Koltun,et al. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.
[22] Dong Yu,et al. Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[23] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[24] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[26] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[27] John S. Garofolo,et al. The Rich Transcription 2006 Spring Meeting Recognition Evaluation , 2006, Machine Learning for Multimodal Interaction.