Information bottleneck based speaker diarization of meetings using non-speech as side information

Background noise and errors in speech/non-speech detection cause significant degradation to the output of a speaker diarization system. In a typical speaker diarization system, non-speech segments are excluded prior to unsupervised clustering. In the current study, we exploit the information present in the non-speech segments of a recording to improve the output of the speaker diarization system based on information bottleneck framework. This is achieved by providing information from non-speech segments as side (irrelevant) information to information bottleneck based clustering. Experiments on meeting recordings from RT 06, 07, 09, evaluation sets have shown that the proposed method decreases the diarization error rate by around 18% relative to the baseline speaker diarization system based on information bottleneck framework. Comparison with a state of the art system based on HMM/GMM framework shows that the proposed method significantly decreases the gap in performance between the information bottleneck system and HMM/GMM system.

[1]  Gerald Friedland,et al.  Where did I go wrong?: Identifying troublesome segments for speaker diarization systems , 2012, INTERSPEECH.

[2]  Gal Chechik,et al.  An Information Theoretic Approach to the Study of Auditory Coding , 2003 .

[3]  Jitendra Ajmera,et al.  A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[4]  Carlos Segura,et al.  Overlap detection for speaker diarization by fusing spectral and spatial features , 2010, INTERSPEECH.

[5]  Jonathan G. Fiscus,et al.  Multimodal Technologies for Perception of Humans, International Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8-11, 2007, Revised Selected Papers , 2008, CLEAR.

[6]  Fabio Valente,et al.  An Information Theoretic Approach to Speaker Diarization of Meeting Data , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Nicholas W. D. Evans,et al.  System output combination for improved speaker diarization , 2010, INTERSPEECH.

[8]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Gal Chechik,et al.  Extracting Relevant Structures with Side Information , 2002, NIPS.

[10]  Hervé Bourlard,et al.  Improved overlap speech diarization of meeting recordings using long-term conversational features , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Fabio Valente,et al.  DiarTk : An Open Source Toolkit for Research in Multistream Speaker Diarization and its Application to Meetings Recordings , 2012, INTERSPEECH.

[12]  Björn W. Schuller,et al.  Convolutive Non-Negative Sparse Coding and New Features for Speech Overlap Handling in Speaker Diarization , 2012, INTERSPEECH.

[13]  Fabio Valente,et al.  Speaker diarization of overlapping speech based on silence distribution in meeting recordings , 2012, INTERSPEECH.

[14]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[15]  Nicholas W. D. Evans,et al.  Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Franciska de Jong,et al.  Robust speech/non-speech classification in heterogeneous multimedia content , 2011, Speech Commun..

[17]  David A. van Leeuwen,et al.  Speaker Diarization Error Analysis Using Oracle Components , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Marijn Huijbregts,et al.  The ICSI RT07s Speaker Diarization System , 2007, CLEAR.

[19]  Xavier Anguera Miró,et al.  Acoustic Beamforming for Speaker Diarization of Meetings , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Jean-François Bonastre,et al.  The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Naftali Tishby,et al.  Agglomerative Information Bottleneck , 1999, NIPS.

[22]  Xavier Anguera Miró,et al.  Purity Algorithms for Speaker Diarization of Meetings Data , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[23]  Gerald Friedland,et al.  Improved Overlapped Speech Handling for Speaker Diarization , 2011, INTERSPEECH.

[24]  Marijn Huijbregts,et al.  The blame game: performance analysis of speaker diarization system components , 2007, INTERSPEECH.

[25]  Fabio Valente,et al.  Information Bottleneck Features for HMM/GMM Speaker Diarization of Meetings Recordings , 2011, INTERSPEECH.

[26]  Dong Wang,et al.  A Comparative Study of Bottom-Up and Top-Down Approaches to Speaker Diarization , 2012, IEEE Transactions on Audio, Speech, and Language Processing.