Generalized Variability Model for Speaker Verification

In this letter, we propose a generalized variability model as an extension to the total variability model. While the total variability model employs a standard normal prior distribution in its typical setup, the proposed generalized variability model relaxes this assumption and allows the latent variable distribution to be a mixture of Gaussians. The conventional total variability model can then be viewed as a special case of this generalized version where the number of mixture components is constrained to one. This proposed model is validated in the context of speaker verification tasks on both the standard and extended NIST SRE 2010 datasets. Experimental results show that modeling the distribution of the latent variables as a mixture of Gaussians leads to a better performance under all conditions and a greater gain can be expected for speaker verification using short utterances.

[1]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[2]  Kong-Aik Lee,et al.  Speaker-Phonetic Vector Estimation for Short Duration Speaker Verification , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[4]  Patrick Kenny,et al.  Mixture of PLDA Models in i-vector Space for Gender-Independent Speaker Recognition , 2011, INTERSPEECH.

[5]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[6]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[7]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[8]  Lukás Burget,et al.  Language Recognition in iVectors Space , 2011, INTERSPEECH.

[9]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[10]  Shrikanth S. Narayanan,et al.  Modified-prior i-vector estimation for language identification of short duration utterances , 2014, INTERSPEECH.

[11]  John H. L. Hansen,et al.  Speaker Recognition by Machines and Humans: A tutorial review , 2015, IEEE Signal Processing Magazine.

[12]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Jen-Tzung Chien,et al.  Mixture of PLDA for Noise Robust I-Vector Speaker Verification , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[15]  Lukás Burget,et al.  iVector-based discriminative adaptation for automatic speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[16]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[18]  Pietro Laface,et al.  Nonlinear I-Vector Transformations for PLDA-Based Speaker Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19]  Douglas A. Reynolds,et al.  Summary and initial results of the 2013-2014 speaker recognition i-vector machine learning challenge , 2014, INTERSPEECH.

[20]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[21]  Rui Xia,et al.  Using i-Vector Space Model for Emotion Recognition , 2012, INTERSPEECH.

[22]  Tomi Kinnunen,et al.  A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Mark J. F. Gales Cluster adaptive training of hidden Markov models , 2000, IEEE Trans. Speech Audio Process..

[24]  A. Montanari,et al.  Heteroscedastic factor mixture analysis , 2010 .

[25]  Larry P. Heck,et al.  MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research , 2013 .

[26]  Alvin F. Martin,et al.  The NIST 2010 speaker recognition evaluation , 2010, INTERSPEECH.

[27]  Yun Lei,et al.  A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[29]  Kong-Aik Lee,et al.  Twin Model G-PLDA for Duration Mismatch Compensation in Text-Independent Speaker Verification , 2016, INTERSPEECH.

[30]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[31]  Hagai Attias,et al.  Independent Factor Analysis with Temporally Structured Sources , 1999, NIPS.

[32]  Vidhyasaharan Sethu,et al.  Duration compensation of i-vectors for short duration speaker verification , 2017 .

[33]  Pavel Matejka,et al.  Hierarchical Structures of Neural Networks for Phoneme Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[34]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[35]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.