Using a priori information for speaker diarization

This paper presents an attempt to use supplementary information for audio data diarization. The approach is based on the use of a priori information about the speakers involved in dialogue. Those specific information are the number of speakers involved in conversation, and training data available for one speaker or for all the speakers involved in conversation. The experiments were mainly conducted on the 2003 Rich Transcription Diarization corpus both Dry Run Corpus and Evaluation corpus. The results show that knowing a priori the exact number of speakers seems not to be a very useful information. On the other hand, using a priori speaker models for one or all speakers involved in the conversation, may improve diarization performance when enough data is available to train reliable speaker models.

[1]  Herbert Gish,et al.  Segregation of speakers for speech recognition and speaker identification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Jean-François Bonastre,et al.  E-HMM approach for learning and adapting sound models for speaker indexing , 2001, Odyssey.

[3]  Jean-François Bonastre,et al.  AMIRAL: A Block-Segmental Multirecognizer Architecture for Automatic Speaker Recognition , 2000, Digit. Signal Process..

[4]  Guillaume Gravier,et al.  Overview of the 2000-2001 ELISA Consortium research activities , 2001, Odyssey.

[5]  Christian Wellekens,et al.  DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..

[6]  Jean-François Bonastre,et al.  The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[8]  Jean-François Bonastre,et al.  Benefits of prior acoustic segmentation for automatic speaker segmentation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Hynek Hermansky,et al.  A new speaker change detection method for two-speaker segmentation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Guillaume Gravier,et al.  Corpus description of the ESTER Evaluation Campaign for the Rich Transcription of French Broadcast News , 2004, LREC.