Speaker based segmentation on broadcast news - on the use of ISI technique

In this paper we propose a new segmentation technique called ISI or “Interlaced Speech Indexing”, developed and implemented for the task of broadcast news indexing. It consists in finding the identity of a well-defined speaker and the moments of his interventions inside an audio document, in order to access rapidly, directly and easily to his speech and then to his talk. Our segmentation procedure is based on an interlaced equidistant segmentation (IES) associated with our new ISI algorithm. This approach uses a speaker identification method based on Second Order Statistical Measures. As SOSM measures, we choose the “µGc” one, which is based on the covariance matrix. However, experiments showed that this method needs, at least, a speech length of 2 seconds, which means that the segmentation resolution will be 2 seconds. By combining the SOSM with the new Indexing technique (ISI), we demonstrate that the average segmentation error is reduced to only 0.5 second, which is more accurate and more interesting for real-time applications. Results indicate that this association provides a high resolution and a high tracking performance: the indexing score (percentage of correctly labelled segments) is 95% on TIMIT database and 92.4% on Hub4 Broadcast news 96 database.

[1]  Daben Liu,et al.  Fast speaker change detection for broadcast news transcription and indexing , 1999, EUROSPEECH.

[2]  Christian Wellekens,et al.  A speaker tracking system based on speaker turn detection for NIST evaluation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Christian Wellekens,et al.  DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..

[4]  Douglas A. Reynolds,et al.  Blind clustering of speech utterances based on speaker and language characteristics , 1998, ICSLP.

[5]  H. Gish Robust discrimination in automatic speaker identification , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[6]  Ivan Magrin-Chagnolleau,et al.  Second-order statistical measures for text-independent speaker identification , 1995, Speech Commun..