Front-End Factor Analysis For Speaker Verification

It is known that the performance of the i-vectors/PLDA based speaker verification systems is affected in the cases of short utterances and limited training data. The performance degradation appears because the shorter the utterance, the less reliable the extracted i-vector is, and because the total variability covariance matrix and the underlying PLDA matrices need a significant amount of data to be robustly estimated. Considering the “MIT Mobile Device Speaker Verification Corpus” (MIT-MDSVC) as a representative dataset for robust speaker verification tasks on limited amount of training data, this paper investigates which configuration and which parameters lead to the best performance of an i-vectors/PLDA based speaker verification. The i-vectors/PLDA based system achieved good performance only when the total variability matrix and the underlying PLDA matrices were trained with data belonging to the enrolled speakers. This way of training means that the system should be fully retrained when new enrolled speakers were added. The performance of the system was more sensitive to the amount of training data of the underlying PLDA matrices than to the amount of training data of the total variability matrix. Overall, the Equal Error Rate performance of the i-vectors/PLDA based system was around 1% below the performance of a GMM-UBM system on the chosen dataset. The paper presents at the end some preliminary experiments in which the utterances comprised in the CSTR VCTK corpus were used besides utterances from MIT-MDSVC for training the total variability covariance matrix and the underlying PLDA matrices.

[1]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[2]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[3]  Ram H. Woo,et al.  Exploration of small enrollment speaker verification on handheld devices , 2005 .

[4]  Alex Park,et al.  The MIT Mobile Device Speaker Verification Corpus: Data Collection and Preliminary Experiments , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[5]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[7]  Patrick Kenny,et al.  An i-vector Extractor Suitable for Speaker Recognition with both Microphone and Telephone Speech , 2010, Odyssey.

[8]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[10]  Driss Matrouf,et al.  Intersession Compensation and Scoring Methods in the i-vectors Space for Speaker Recognition , 2011, INTERSPEECH.

[11]  Daniel Garcia-Romero,et al.  Multicondition training of Gaussian PLDA models in i-vector space for noise and reverberation robust speaker recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Themos Stafylakis,et al.  PLDA for speaker verification with utterances of arbitrary duration , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Haizhou Li,et al.  ALIZE 3.0 - open source toolkit for state-of-the-art speaker recognition , 2013, INTERSPEECH.

[14]  Tomoki Toda,et al.  SAS: A speaker verification spoofing database containing diverse attacks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Florin Curelaru Evaluation of the generative and discriminative text-independent speaker verification approaches on handheld devices , 2015, 2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD).

[16]  Junichi Yamagishi,et al.  SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .

[17]  George D. C. Cavalcanti,et al.  Optimizing speaker-specific filter banks for speaker verification , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).