An unsupervised scheme for speaker indexing of audio databases

Speaker indexing of an audio database consists in organizing the audio data according to the speakers present in the database. This paper investigates on segmenting and clustering continuous audio streams automatically by speaker with no prior speaker model. It is composed of two steps: (1) segmentation based on GLR distance measure and BIC refinement, (2) clustering based on agglomerative clustering and pruning selection. The aim is to produce just one pure cluster for every speaker. Results are presented using the data sets derived from the Switchboard corpus and the effectiveness of the proposed scheme is shown.

[1]  Douglas A. Reynolds,et al.  Approaches to Speaker Detection and Tracking in Conversational Speech , 2000, Digit. Signal Process..

[2]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[3]  Christian Wellekens,et al.  A speaker tracking system based on speaker turn detection for NIST evaluation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Sue E. Johnson,et al.  Who spoke when? - automatic segmentation and clustering for determining speaker turns , 1999, EUROSPEECH.

[5]  Herbert Gish,et al.  Clustering speakers by their voices , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).