The influence of speech activity detection and overlap on speaker diarization for meeting room recordings

Abstract This paper addresses the problem of speaker diarization inthe specific context of meeting room recordings which of-ten involve a high degree of spontaneous speech with largeoverlapped speech segments, speaker noise (laughs, whispers,coughs, etc.) and very short speaker turns. A large variabilityin signal quality has brought an additional level of complexity.This paper investigates the effects of speech activity detectionand overlapped speech through speaker diarization experimentsconducted on the NIST RT’05 and RT’06 data sets. Resultsindicate that our system is highly sensitive to the shape of theinitial segmentation and that, perhaps surprisingly, perfect ref-erences can even degrade performance. Finally we propose adirection for future research to incorporate confidence valuesaccording to acoustic attributes in order to unify what is cur-rently a somewhat disjointed approach to speaker diarization. Index Terms : speaker diarization, meeting room, speech activ-ity detection, overlapped speech.

[1]  Corinne Fredouille,et al.  Technical Improvements of the E-HMM Based Speaker Diarization System for Meeting Records , 2006, MLMI.

[2]  Tanja Schultz,et al.  Unsupervised Learning of Overlapped Speech Model Parameters For Multichannel Speech Activity Detection in Meetings , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3]  Jean-François Bonastre,et al.  ALIZE, a free toolkit for speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4]  Jean-François Bonastre,et al.  Step-by-step and integrated approaches in broadcast news speaker diarization , 2006, Comput. Speech Lang..

[5]  Xavier Anguera Miró,et al.  Robust speaker diarization for meetings: ICSI RT06s evaluation system , 2006, INTERSPEECH.

[6]  Jean-Luc Gauvain,et al.  Combining speaker identification and BIC for speaker diarization , 2005, INTERSPEECH.