Geometric Discriminant Analysis for I-vector Based Speaker Verification

Many i-vector based speaker verification use linear discriminant analysis (LDA) as a post-processing stage. LDA maximizes the arithmetic mean of the Kullback-Leibler (KL) divergences between different pairs of speakers. However, for speaker verification, speakers with small divergence are easily misjudged. LDA is not optimal because it does not emphasize on enlarging small divergences. In addition, LDA makes an assumption that the i-vectors of different speakers are well modeled by Gaussian distributions with identical class covariance. Actually, the distributions of different speakers can have different covariances. Motivated by these observations, we explore speaker verification with geometric discriminant analysis (GDA), which uses geometric mean instead of arithmetic mean when maximizing the KL divergences. It puts more emphasis on enlarging small divergences. Furthermore, we study the heteroscedastic extension of GDA (HGDA), taking different covariances into consideration. Experiments on i-vector machine learning challenge indicate that, when the number of training speakers becomes smaller, the relative performance improvement of GDA and HGDA compared with LDA becomes larger. GDA and HGDA are better choices especially when training data is limited.

[1]  Jia Liu,et al.  Distance-Dependent Metric Learning , 2019, IEEE Signal Processing Letters.

[2]  Mitchell McLaren,et al.  Weighted LDA techniques for i-vector based speaker verification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Sridha Sridharan,et al.  Improving short utterance i-vector speaker verification using utterance variance modelling and compensation techniques , 2014, Speech Commun..

[4]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[5]  Xuelong Li,et al.  Geometric Mean for Subspace Selection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Jia Liu,et al.  Local Pairwise Linear Discriminant Analysis for Speaker Verification , 2018, IEEE Signal Processing Letters.

[7]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Robert P. W. Duin,et al.  Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Sheng Zhang,et al.  Improved Supervised Locality Preserving Projection for I-vector Based Speaker Verification , 2018, INTERSPEECH.

[11]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Douglas A. Reynolds,et al.  Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[13]  Douglas A. Reynolds,et al.  Summary and initial results of the 2013-2014 speaker recognition i-vector machine learning challenge , 2014, INTERSPEECH.

[14]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[15]  Shen Zhong,et al.  A Supervised Locality Preserving Projection Algorithm for Dimensionality Reduction , 2008 .

[16]  Driss Matrouf,et al.  Variance-spectra based normalization for i-vector standard and probabilistic linear discriminant analysis , 2012, Odyssey.

[17]  M. Loog Approximate Pairwise Accuracy Criteria for Multiclass Linear Dimension Reduction: Generalisations of the Fisher Criterion , 1999 .

[18]  Seyed Omid Sadjadi,et al.  Speaker age estimation on conversational telephone speech using senone posterior based i-vectors , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Seyed Omid Sadjadi,et al.  Nearest neighbor discriminant analysis for robust speaker recognition , 2014, INTERSPEECH.

[20]  Seyed Omid Sadjadi,et al.  The IBM 2016 Speaker Recognition System , 2016, Odyssey.