An utterance comparison model for speaker clustering using factor analysis

We propose a novel utterance comparison model based on probability theory and factor analysis that computes the likelihood of two speech utterances originating from the same speaker. The model depends only on a set of statistics extracted from each utterance and can efficiently compare utterances using these statistics without requiring the indefinite storage of speech features. We apply the model as a distance metric for speaker clustering in the CALLHOME telephone conversation corpus to achieve competitive results compared to three other known similarity measures: the Generalized Likelihood Ratio, Cross-Likelihood Ratio, and eigenvoice distance.

[1]  Roland Kuhn,et al.  Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[2]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  G. Ruske,et al.  Robust speaker clustering in eigenspace , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[4]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Masafumi Nishida,et al.  Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing , 2005, IEEE Transactions on Speech and Audio Processing.

[6]  Hsin-Min Wang,et al.  Automatic Speaker Clustering Using a Voice Characteristic Reference Space and Maximum Purity Estimation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Pietro Laface,et al.  Stream-based speaker segmentation using speaker factors and eigenvoices , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.