I-vector Kullback-Leibler divisive normalization for PLDA speaker verification

I-vector and Probabilistic Linear Discriminant Analysis (PLDA) represents the state-of-the-art in the speaker verification system. In PLDA, the i-vectors are assumed to follow Gaussian distribution. However, this assumption results in poor modeling without Gaussianization. Different from previous Gaussianization methods, in our proposed method, we make no restriction towards the original distribution of i-vectors for flexibility and universality. To optimize the Gaussian transformation function, Kullback-Leibler divergence (KLD) is introduced to measure the distance between the two distributions. By minimizing the KLD value under the development data, we can search out the optimal parameters in transformation function. The proposed method shows significant improvement on NIST SRE 2008 core set; together with length normalization (LN), a famous Gaussianization method, can further improve the verification accuracy.

[1]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[2]  Pierre-Michel Bousquet,et al.  Constrained discriminative speaker verification specific to normalized i-vectors , 2016, Odyssey.

[3]  Siwei Lyu Divisive Normalization: Justification and Effectiveness as Efficient Coding Transform , 2010, NIPS.

[4]  Jean-François Cardoso,et al.  Dependence, Correlation and Gaussianity in Independent Component Analysis , 2003, J. Mach. Learn. Res..

[5]  D. Heeger Normalization of cell responses in cat striate cortex , 1992, Visual Neuroscience.

[6]  Florin Curelaru,et al.  Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).

[7]  Pietro Laface,et al.  I-vector transformation and scaling for PLDA based speaker recognition , 2016, Odyssey.

[8]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[9]  M. C. Jones,et al.  Sinh-arcsinh distributions , 2009 .

[10]  Driss Matrouf,et al.  Variance-spectra based normalization for i-vector standard and probabilistic linear discriminant analysis , 2012, Odyssey.

[11]  John H. L. Hansen,et al.  Speaker Recognition by Machines and Humans: A tutorial review , 2015, IEEE Signal Processing Magazine.

[12]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  Shawn R. Olsen,et al.  Divisive Normalization in Olfactory Population Codes , 2010, Neuron.

[14]  M. Carandini,et al.  Normalization as a canonical neural computation , 2013, Nature Reviews Neuroscience.

[15]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[16]  Syed Abdul Rahman Al-Haddad,et al.  Distant Speaker Recognition: An Overview , 2016, Int. J. Humanoid Robotics.

[17]  Driss Matrouf,et al.  Intersession Compensation and Scoring Methods in the i-vectors Space for Speaker Recognition , 2011, INTERSPEECH.

[18]  Marisa Carrasco,et al.  Deconstructing Interocular Suppression: Attention and Divisive Normalization , 2015, PLoS Comput. Biol..

[19]  Lina J. Karam,et al.  Change detection on SAR images using divisive normalization-based image representation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Tatsuo K Sato,et al.  An excitatory basis for divisive normalization in visual cortex , 2016, Nature Neuroscience.

[21]  Thomas M. Cover,et al.  Elements of Information Theory 2006 , 2009 .