A new speaker change detection method for two-speaker segmentation

In absence of prior information about speakers, an important step in speaker segmentation is to obtain initial estimates for training speaker models. In this paper, we present a new method for obtaining these estimates. The method assumes that a conversation must be initiated by one of the speakers. Thus one speaker model is estimated from the small segment at the beginning of the conversation and the segment that has the largest distance from the initial segment is used to train second speaker model. We describe a system based on this method and evaluate it on two different tasks: a controlled task with variations in the duration of the initial speaker segment and amount of overlapped speech and 2001 NIST Speaker Recognition Evaluation task that contains natural conversations. This system shows significant improvements over the conventional system in absence of overlapped speech on the controlled task.

[1]  F. Itakura Line spectrum representation of linear predictor coefficients of speech signals , 1975 .

[2]  Alexander H. Waibel,et al.  Strategies for automatic segmentation of audio data , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Douglas A. Reynolds,et al.  HTIMIT and LLHDB: speech corpora for the study of handset transducer effects , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Christian Wellekens,et al.  DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..

[5]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[6]  Douglas A. Reynolds,et al.  The lincoln speaker recognition system: NIST eval2000 , 2000, INTERSPEECH.

[7]  Herbert Gish,et al.  Segregation of speakers for speech recognition and speaker identification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.