PLDA based speaker recognition on short utterances

This paper investigates the effects of limited speech data in the context of speaker verification using a probabilistic linear discriminant analysis (PLDA) approach. Being able to reduce the length of required speech data is important to the development of automatic speaker verification system in real world applications. When sufficient speech is available, previous research has shown that heavy-tailed PLDA (HTPLDA) modeling of speakers in the i-vector space provides state-of-the-art performance, however, the robustness of HTPLDA to the limited speech resources in development, enrolment and verification is an important issue that has not yet been investigated. In this paper, we analyze the speaker verification performance with regards to the duration of utterances used for both speaker evaluation (enrolment and verification) and score normalization and PLDA modeling during development. Two different approaches to total-variability representation are analyzed within the PLDA approach to show improved performance in short-utterance mismatched evaluation conditions and conditions for which insufficient speech resources are available for adequate system development. The results presented within this paper using the NIST 2008 Speaker Recognition Evaluation dataset suggest that the HTPLDA system can continue to achieve better performance than Gaussian PLDA (GPLDA) as evaluation utterance lengths are decreased. We also highlight the importance of matching durations for score normalization and PLDA modeling to the expected evaluation conditions. Finally, we found that a pooled total-variability approach to PLDA modeling can achieve better performance than the traditional concatenated total-variability approach for short utterances in mismatched evaluation conditions and conditions for which insufficient speech resources are available for adequate system development.

[1]  Sridha Sridharan,et al.  Factor analysis subspace estimation for speaker verification with short utterances , 2008, INTERSPEECH.

[2]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Jeffery R. Price,et al.  Face recognition using direct, weighted linear discriminant analysis and modular subspaces , 2005, Pattern Recognit..

[4]  James R. Glass,et al.  Cosine Similarity Scoring without Score Normalization Techniques , 2010, Odyssey.

[5]  Lukás Burget,et al.  Discriminatively trained Probabilistic Linear Discriminant Analysis for speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  James R. Glass,et al.  Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification , 2010, Odyssey.

[7]  Sridha Sridharan,et al.  Experiments in SVM-based Speaker Verification Using Short Utterances , 2010, Odyssey.

[8]  Sridha Sridharan,et al.  i-vector Based Speaker Recognition on Short Utterances , 2011, INTERSPEECH.

[9]  Patrick Kenny,et al.  Mixture of PLDA Models in i-vector Space for Gender-Independent Speaker Recognition , 2011, INTERSPEECH.

[10]  Patrick Kenny,et al.  Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.

[11]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[12]  David A. van Leeuwen,et al.  Improved speaker recognition when using i-vectors from multiple speech sources , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Frank Dellaert,et al.  Robust Generative Subspace Modeling: The Subspace t Distribution , 2004 .

[14]  Sridha Sridharan,et al.  Factor analysis modelling for speaker verification with short utterances , 2008, Odyssey.