An Improved Uncertainty Propagation Method for Robust I-vector Based Speaker Recognition

The performance of automatic speaker recognition systems degrades when facing distorted speech data containing additive noise and/or reverberation. Statistical uncertainty propagation has been introduced as a promising paradigm to address this challenge. So far, different uncertainty propagation methods have been proposed to compensate noise and reverberation in i-vectors in the context of speaker recognition. They have achieved promising results on small datasets such as YOHO and Wall Street Journal, but little or no improvement on the larger, highly variable NIST Speaker Recognition Evaluation (SRE) corpus. In this paper, we propose a complete uncertainty propagation method, whereby we model the effect of uncertainty both in the computation of unbiased Baum-Welch statistics and in the derivation of the posterior expectation of the i-vector. We conduct experiments on the NIST-SRE corpus mixed with real domestic noise and reverberation from the CHiME-2 corpus and preprocessed by multichannel speech enhancement. The proposed method improves the equal error rate (EER) by 4% relative compared to a conventional i-vector based speaker verification baseline. This is to be compared with previous methods which degrade performance.

[1]  Philip C. Loizou Speaker Verification in Noise Using a Stochastic Version of the Weighted Viterbi Algorithm , 2002 .

[2]  Emmanuel Vincent,et al.  Nonparametric Uncertainty Estimation and Propagation for Noise Robust ASR , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Jen-Tzung Chien,et al.  Fast scoring for PLDA with uncertainty propagation via i-vector grouping , 2017, Comput. Speech Lang..

[4]  Yuan Dong,et al.  Variational Bayesian Joint Factor Analysis Models for Speaker Verification , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Emmanuel Vincent,et al.  Uncertainty propagation for noise robust speaker recognition: the case of NIST-SRE , 2015, INTERSPEECH.

[6]  Emmanuel Vincent,et al.  A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Reinhold Haeb-Umbach,et al.  Robust Speech Recognition of Uncertain or Missing Data - Theory and Applications , 2011 .

[8]  Themos Stafylakis,et al.  Text-dependent speaker recognition using PLDA with uncertainty propagation , 2013, INTERSPEECH.

[9]  Florin Curelaru,et al.  Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).

[10]  Emmanuel Vincent,et al.  Full multicondition training for robust i-vector based speaker recognition , 2015, INTERSPEECH.

[11]  Li Deng,et al.  Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion , 2005, IEEE Transactions on Speech and Audio Processing.

[12]  John H. L. Hansen,et al.  Uncertainty propagation in front end factor analysis for noise robust speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Themos Stafylakis,et al.  PLDA for speaker verification with utterances of arbitrary duration , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Mathieu Lagrange,et al.  Uncertainty-based learning of acoustic models from noisy data , 2013, Comput. Speech Lang..

[15]  Themos Stafylakis,et al.  Uncertainty Modeling Without Subspace Methods For Text-Dependent Speaker Recognition , 2016, Odyssey.

[16]  Xabier Jaureguiberry,et al.  The Flexible Audio Source Separation Toolbox Version 2.0 , 2014, ICASSP 2014.

[17]  Mark A. Clements,et al.  Using observation uncertainty in HMM decoding , 2002, INTERSPEECH.

[18]  Chin-Hui Lee,et al.  Minimax i-vector extractor for short duration speaker verification , 2013, INTERSPEECH.

[19]  John H. L. Hansen,et al.  The I4U Submission to the 2012 NIST Speaker Recognition Evaluation , 2012 .

[20]  Liang Lu,et al.  Variational Bayesian Joint factor analysis for speaker verification , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Paavo Alku,et al.  Accounting for uncertainty of i-vectors in speaker recognition using uncertainty propagation and modified imputation , 2015, INTERSPEECH.

[22]  Pietro Laface,et al.  On the use of i–vector posterior distributions in Probabilistic Linear Discriminant Analysis , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[23]  Jon Barker,et al.  The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[25]  Mark J. F. Gales,et al.  Joint uncertainty decoding for noise robust speech recognition , 2005, INTERSPEECH.