SPEAKER DIARIZATION IN THE ELISA CONSORTIUM OVER THE LAST 4 YEARS

This paper summarizes the collaboration of the LIA and CLIPS laboratories, members of the ELISA consortium, along the last 4 year NIST speaker diarization system evaluation campaigns. In this context, two individual approaches, quite different, have been developed individually by each lab, to respond to the specific task of speaker segmentation. The first one relies on a classical two-step speaker segmentation strategy, based on the detection of speaker turns followed by a clustering process, while the second one corresponds to an integrated strategy where both segment boundaries and speaker tying of the segments are extracted simultaneously and challenged during the whole process. From these two main methods, various strategies were investigated for the fusion of segmentation results. Through the performance achieved along the different evaluation campaigns as well as the experience gained by the LIA and CLIPS labs in the speaker diarization task, a discussion about the overall work done in this evaluation context is drawn in this paper, proposing further investigation and progression.

[1]  Jean-François Bonastre,et al.  AMIRAL: A Block-Segmental Multirecognizer Architecture for Automatic Speaker Recognition , 2000, Digit. Signal Process..

[2]  Jean-François Bonastre,et al.  The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Sylvain Meignier,et al.  The ELISA consortium approaches in speaker segmentation during the NIST 2002 speaker recognition evaluation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  Jean-François Bonastre,et al.  Benefits of prior acoustic segmentation for automatic speaker segmentation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Jean-François Bonastre,et al.  The NIST 2004 spring rich transcription evaluation : two-axis merging strategy in the context of multiple distance microphone based meeting speaker segmentation , 2004 .

[6]  Jean-François Bonastre,et al.  E-HMM approach for learning and adapting sound models for speaker indexing , 2001, Odyssey.

[7]  Christian Wellekens,et al.  DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..

[8]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[9]  Laurent Besacier,et al.  Using a priori information for speaker diarization , 2004, Odyssey.

[10]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.