Overlap-Aware Diarization: Resegmentation Using Neural End-to-End Overlapped Speech Detection

We address the problem of effectively handling overlapping speech in a diarization system. First, we detail a neural Long Short-Term Memory- based architecture for overlap detection. Secondly, detected overlap regions are exploited in conjunction with a frame-level speaker posterior matrix to make two-speaker assignments for overlapped frames in the resegmentation step. The overlap detection module achieves state-of-the-art performance on the AMI, DIHARD, and ETAPE corpora. We apply overlap-aware resegmentation on AMI, resulting in a 20% relative DER reduction over the baseline system. While this approach is by no means an end-all solution to overlap-aware diarization, it reveals promising directions for handling overlap.

[1]  Claude Barras,et al.  Neural Speech Turn Segmentation and Affinity Propagation for Speaker Diarization , 2018, INTERSPEECH.

[2]  Delphine Charlet,et al.  Impact of overlapping speech detection on speaker diarization for broadcast news and debates , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  David A. van Leeuwen,et al.  Speech overlap detection in a two-pass speaker diarization system , 2009, INTERSPEECH.

[4]  Fabio Valente,et al.  Speaker diarization of overlapping speech based on silence distribution in meeting recordings , 2012, INTERSPEECH.

[5]  Olivier Galibert,et al.  The ETAPE corpus for the evaluation of speech-based TV content processing in the French language , 2012, LREC.

[6]  Jean Carletta,et al.  Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus , 2007, Lang. Resour. Evaluation.

[7]  Pavel Korshunov,et al.  Pyannote.Audio: Neural Building Blocks for Speaker Diarization , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Daniel Garcia-Romero,et al.  Diarization resegmentation in the factor analysis subspace , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Björn W. Schuller,et al.  Enhancing LSTM RNN-Based Speech Overlap Detection by Artificially Mixed Data , 2017, Semantic Audio.

[10]  Mireia Díez,et al.  Speaker Diarization based on Bayesian HMM with Eigenvoice Priors , 2018, Odyssey.

[11]  Hervé Bredin,et al.  pyannote.metrics: A Toolkit for Reproducible Evaluation, Diagnostic, and Error Analysis of Speaker Diarization Systems , 2017, INTERSPEECH.

[12]  Gerald Friedland,et al.  Overlapped speech detection for improved speaker diarization in multiparty meetings , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Marie Kunesová,et al.  Detection of Overlapping Speech for the Purposes of Speaker Diarization , 2019, SPECOM.

[14]  Kenneth Ward Church,et al.  The Second DIHARD Diarization Challenge: Dataset, task, and baselines , 2019, INTERSPEECH.

[15]  Björn W. Schuller,et al.  Detecting overlapping speech with long short-term memory recurrent neural networks , 2013, INTERSPEECH.

[16]  Valentin Andrei,et al.  Detecting Overlapped Speech on Short Timeframes Using Deep Learning , 2017, INTERSPEECH.

[17]  Yoshua Bengio,et al.  Speaker Recognition from Raw Waveform with SincNet , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[18]  Bernd Edler,et al.  CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19]  Mari Ostendorf,et al.  Efficient use of overlap information in speaker diarization , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).