论文信息 - Cross Likelihood Ratio Based Speaker Clustering Using Eigenvoice Models

Cross Likelihood Ratio Based Speaker Clustering Using Eigenvoice Models

This paper proposes the use of eigenvoice modeling techniques with the Cross Likelihood Ratio (CLR) as a criterion for speaker clustering within a speaker diarization system. The CLR has previously been shown to be a robust decision criterion for speaker clustering using Gaussian Mixture Models. Recently, eigenvoice modeling techniques have become increasingly popular, due to its ability to adequately represent a speaker based on sparse training data, as well as an improved capture of differences in speaker characteristics. This paper hence proposes that it would be beneficial to capitalize on the advantages of eigenvoice modeling in a CLR framework. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 35.1% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Sridha Sridharan | David Dean | Robbie Vogt | David Wang

[1] Fall 2004 Rich Transcription ( RT-04 F ) Evaluation Plan , .

[2] Fabio Valente,et al. Variational Bayesian Methods for Audio Indexing , 2005, MLMI.

[3] Sridha Sridharan,et al. Bayes Factor based speaker clustering for speaker diarization , 2010, 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010).

[4] Patrick Kenny,et al. Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5] S. Chen,et al. Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[6] Patrick Kenny,et al. Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[7] Sridha Sridharan,et al. Bayes factor based speaker segmentation for speaker diarization , 2010, INTERSPEECH.

[8] Douglas A. Reynolds,et al. Diarization of Telephone Conversations Using Factor Analysis , 2010, IEEE Journal of Selected Topics in Signal Processing.

[9] M. A. Siegler,et al. Automatic Segmentation, Classification and Clustering of Broadcast News Audio , 1997 .

[10] Jean-Luc Gauvain,et al. Multistage speaker diarization of broadcast news , 2006, IEEE Transactions on Audio, Speech, and Language Processing.