Iterative Unsupervised GMM Training for Speaker Indexing

The paper addresses a novel algorithm for speaker searching and indexation based on unsupervised GMM training. The proposed method doesn’t require a predefined set of generic background models, and the GMM speaker models are trained only from test samples. The constrain of the method is that the number of the speakers has to be known in advance. The results of initial experiments show that the proposed training method enables to create precise GMM speaker models from only a small amount of training data.

[1]  Aaron E. Rosenberg,et al.  Unsupervised speaker segmentation of telephone conversations , 2002, INTERSPEECH.

[2]  Shrikanth S. Narayanan,et al.  A method for on-line speaker indexing using generic reference models , 2003, INTERSPEECH.

[3]  Hynek Hermansky,et al.  A new speaker change detection method for two-speaker segmentation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[5]  Tatsuya Kawahara,et al.  Unsupervised speaker indexing using anchor models and automatic transcription of discussions , 2003, INTERSPEECH.

[6]  Ramesh A. Gopinath,et al.  Improved speaker segmentation and segments clustering using the bayesian information criterion , 1999, EUROSPEECH.

[7]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[8]  Douglas E. Sturim,et al.  Speaker indexing in large audio databases using anchor models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[9]  M. A. Siegler,et al.  Automatic Segmentation, Classification and Clustering of Broadcast News Audio , 1997 .

[10]  Christian Wellekens,et al.  DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..

[11]  Douglas A. Reynolds,et al.  Approaches to Speaker Detection and Tracking in Conversational Speech , 2000, Digit. Signal Process..