Speaker Clustering of Stereo Audio Documents Based on Sequential Gathering Process

This paper focuses on the use of sequential speaker clustering of stereo audio documents to obtain a classification of the different speech segments contained in those documents, according to the speakers who are participating in the audio recording. In general, speaker clustering is used as a second step in a global system of speaker diariza- tion, where the first step deals with the task of speaker segmentation. However, in some applications, the term speaker diarization is confused with speaker clustering. In such applications, the homogeneous segments are automatically separated, like in telephone answering machines or vocal boxes, where the vocal messages are already separated by a sound beep. Even though in our global project we use two main techniques based on speaker localization and speaker discrimination, in this paper we will describe only the second technique, which uses a sequential clustering approach, in order to gather the similar homogeneous segments into classes of speakers. Each class contains the global intervention of only one speaker in the entire audio document. The sequential cluster- ing approach uses a mono-gaussian measure ( G) that allows us to assess the degree of similarity between the different homogeneous segments. The application concerns the clustering of stereo debates between several speakers who are located at different positions in the meeting-room. For the evaluation, experiments are conducted on a stereophonic database called DB15, which is composed of 15 scenarios of about 3.5mn each and con- taining two or three speakers speaking sequentially in every scenario. The new algorithm shows good performances, when the length of the speech segments is over 4s. Keywords: speaker clustering, speaker diarization, sequential clustering, stereo audio document, sec- ond order statistical measures, mono-gaussian measures.

[1]  Dan Istrate,et al.  NIST RT'05S Evaluation: Pre-processing Techniques and Speaker Diarization on Multiple Microphone Meetings , 2005, MLMI.

[2]  Douglas A. Reynolds,et al.  Speaker diarisation for broadcast news , 2004, Odyssey.

[3]  Jean-Claude Junqua,et al.  Towards domain independent speaker clustering , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  Mhania Guerti,et al.  Speaker based segmentation on broadcast news - on the use of ISI technique , 2006, ExLing.

[5]  Seiichi Nakagawa,et al.  Speaker change detection and speaker clustering using VQ distortion for broadcast news speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  Bin Ma,et al.  Speaker Diarization Using Direction of Arrival Estimate and Acoustic Feature Information: The I2R-NTU Submission for the NIST RT 2007 Evaluation , 2007, CLEAR.

[8]  John H. L. Hansen,et al.  Unsupervised audio stream segmentation and clustering via the Bayesian information criterion , 2000, INTERSPEECH.

[9]  Driss Aboutajdine,et al.  Fast Incremental Clustering of Gaussian Mixture Speaker Models for Scaling up Retrieval In On-Line Broadcast , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  Tanja Schultz,et al.  Speaker segmentation and clustering in meetings , 2004, INTERSPEECH.

[11]  José Manuel Pardo,et al.  Robust Speaker Diarization for meetings , 2006 .

[12]  Jean-Luc Gauvain,et al.  Improving Speaker Diarization , 2004 .

[13]  Yonghong Yan,et al.  A Novel Fuzzy-Based Automatic Speaker Clustering Algorithm , 2009, ISNN.

[14]  Douglas A. Reynolds,et al.  Blind clustering of speech utterances based on speaker and language characteristics , 1998, ICSLP.

[15]  France Mihelic,et al.  Fusion of Acoustic and Prosodic Features for Speaker Clustering , 2009, TSD.

[16]  Herbert Gish,et al.  Segregation of speakers for speech recognition and speaker identification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[17]  Jitendra Ajmera,et al.  A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[18]  Philip C. Woodland,et al.  Speaker clustering using direct maximisation of the MLLR-adapted likelihood , 1998, ICSLP.

[19]  H. Gish,et al.  An unsupervised, sequential learning algorithm for the segmentation of speech waveforms with multiple speakers , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Frédéric Bimbot,et al.  Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs , 2004, INTERSPEECH.

[21]  Halim Sayoud,et al.  Optimal Spectral Resolution in Speaker Authentication Application in Noisy Environment and Telephony , 2009, Int. J. Mob. Comput. Multim. Commun..

[22]  H. Bourlard,et al.  Improved Unknown-Multiple Speaker clustering using HMM , 2002 .

[23]  Ponani S. Gopalakrishnan,et al.  Clustering via the Bayesian information criterion with applications in speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[24]  Fabio Valente,et al.  Combination of agglomerative and sequential clustering for speaker diarization , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Herbert Gish,et al.  Clustering speakers by their voices , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[26]  Xavier Anguera Miró ROBUST SPEAKER DIARIZATION FOR MEETINGS , 2006 .

[27]  Hideyuki Suzuki,et al.  A new speech recognition method based on VQ-distortion measure and HMM , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Yonghong Yan,et al.  A Decision-Tree-Based Online Speaker Clustering , 2007, IbPRIA.

[29]  D A Reynolds,et al.  The MIT Lincoln Laboratory RT-04F Diarization Systems: Applications to Broadcast Audio and Telephone Conversations , 2004 .