A method for on-line speaker indexing using generic reference models

On-line Speaker indexing is useful for multimedia applications such as meeting or teleconference archiving and browsing. It sequentially detects the points where a speaker identity changes in a multi-speaker audio stream, and classifies each speaker segment. The main problem of on-line processing is that we can use only current and previous information in the data stream for any decisioning. To address this difficulty, we apply a predetermined reference speaker-independent model set. This set can be useful for more accurate speaker modeling and clustering without actual training of target data speaker models. Once a speaker-independent model is selected from the reference set, it is adapted into a speaker-dependent model progressively. Experiments were performed with HUB-4 Broadcast News Evaluation English Test Material(1999) and Speaker Recognition Benchmark NIST Speech(1999). Results showed that our new technique gave 96.5% indexing accuracy on a telephone conversation data source and 84.3% accuracy on a broadcast news source.

[1]  Aaron E. Rosenberg,et al.  Unsupervised speaker segmentation of telephone conversations , 2002, INTERSPEECH.

[2]  Michael Picheny,et al.  Speaker clustering and transformation for speaker adaptation in speech recognition systems , 1998, IEEE Trans. Speech Audio Process..

[3]  Ming Liu,et al.  Hierarchical Gaussian mixture model for speaker verification , 2002, INTERSPEECH.

[4]  Masafumi Nishida,et al.  Speaker indexing for news articles, debates and drama in broadcasted TV programs , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[5]  Alexander H. Waibel,et al.  Multimodal people ID for a multimedia meeting browser , 1999, MULTIMEDIA '99.

[6]  Chin-Hui Lee,et al.  Background model design for flexible and portable speaker verification systems , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[7]  Philip C. Woodland,et al.  Speaker adaptation: techniques and challenges , 1999 .

[8]  Jian Wu,et al.  Cohorts based custom models for rapid speaker and dialect adaptation , 2001, INTERSPEECH.

[9]  Shrikanth S. Narayanan,et al.  Speaker change detection using a new weighted distance measure , 2002, INTERSPEECH.

[10]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[11]  Herbert Gish,et al.  Clustering speakers by their voices , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).