Fast Scoring of Full Posterior PLDA Models

A low-dimensional representation of a speech segment, the so-called i-vector, in combination with probabilistic linear discriminant analysis (PLDA) models, is the current state-of-the-art in speaker recognition. An i-vector is a compact representation of a Gaussian Mixture Model (GMM) supervector, which captures most of the GMM supervectors variability. It is usually obtained by a MAP estimate of the mean of a posterior distribution. A new PLDA model has been recently presented that, unlike the standard one, exploits the intrinsic i-vector uncertainty. This approach, referred to in this paper as Full Posterior Distribution PLDA (FP-PLDA), is particularly effective for speaker detection of short and variable duration speech segments. It is, however, computationally far more expensive than standard PLDA, making it unattractive for real applications. This paper presents three simplifications of FP-PLDA based on approximate diagonalizations of matrices involved in FP-PLDA scoring. Using in sequence these approximations allows obtaining computational costs comparable to PLDA models, with only a small performance degradation with respect to the more accurate, but less efficient, FP-PLDA models. In particular, up to 10% better performance than PLDA is obtained, with similar computational complexity, on short speech segments of variable duration, randomly extracted from the interviews and telephone conversations included in the NIST SRE 2010 extended dataset. The benefits of the proposed diagonalization approaches have also been confirmed on a short utterance text-independent verification task, where approximately 43% and 34% improvement of the EER and minimum DCF08, respectively, has been obtained with respect to PLDA.

[1]  Lukás Burget,et al.  Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Pietro Laface,et al.  Probabilistic linear discriminant analysis of i-vector posterior distributions , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Patrick Kenny,et al.  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[4]  Patrick Kenny,et al.  Mixture of PLDA Models in i-vector Space for Gender-Independent Speaker Recognition , 2011, INTERSPEECH.

[5]  John H. L. Hansen,et al.  Acoustic Factor Analysis for Robust Speaker Verification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Balaji Vasan Srinivasan,et al.  A Symmetric Kernel Partial Least Squares Framework for Speaker Recognition , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[8]  Niko Brümmer,et al.  The speaker partitioning problem , 2010, Odyssey.

[9]  Themos Stafylakis,et al.  PLDA for speaker verification with utterances of arbitrary duration , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  John H. L. Hansen,et al.  I4u submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification , 2013, INTERSPEECH.

[11]  Pietro Laface,et al.  On the use of i–vector posterior distributions in Probabilistic Linear Discriminant Analysis , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  Bin Ma,et al.  Sparse Classifier Fusion for Speaker Verification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Pietro Laface,et al.  Factorized Sub-Space Estimation for Fast and Memory Effective I-vector Extraction , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Hynek Hermansky,et al.  Developing a speaker identification system for the DARPA RATS project , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Lukás Burget,et al.  Simplification and optimization of i-vector extraction , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Niko Brümmer,et al.  Towards Fully Bayesian Speaker Recognition: Integrating Out the Between-Speaker Covariance , 2011, INTERSPEECH.

[17]  Eduardo Lleida,et al.  The I3a speaker recognition system for NIST SRE12: post-evaluation analysis , 2013, INTERSPEECH.

[18]  Pietro Laface,et al.  Memory and Computation Trade-Offs for Efficient I-Vector Extraction , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Bengt J. Borgstrom,et al.  Supervector Bayesian speaker comparison , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Yun Lei,et al.  Improving speaker identification robustness to highly channel-degraded speech through multiple system fusion , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Yun Lei,et al.  A noise-robust system for NIST 2012 speaker recognition evaluation , 2013, INTERSPEECH.