Speaker-Aware Linear Discriminant Analysis in Speaker Verification

Linear discriminant analysis (LDA) is an effective and widely used discriminative technique for speaker verification. However, it only utilizes the information on global structure to perform classification. Some variants of LDA, such as local pairwise LDA (LPLDA), are proposed to preserve more information on the local structure in the linear projection matrix. However, considering that the local structure may vary a lot in different regions, summing up related components to construct a single projection matrix may not be sufficient. In this paper, we present a speaker-aware strategy focusing on preserving distinct information on local structure in a set of linear discriminant projection matrices, and allocating them to different local regions for dimension reduction and classification. Experiments on NIST SRE2010 and NIST SRE2016 show that the speaker-aware strategy can boost the performance of both LDA and LPLDA backends in i-vector systems and x-vector systems.

[1]  David Zhang,et al.  Local Linear Discriminant Analysis Framework Using Sample Neighbors , 2011, IEEE Transactions on Neural Networks.

[2]  Yun Lei,et al.  A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Sanjeev Khudanpur,et al.  X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Alvin F. Martin,et al.  The NIST 2010 speaker recognition evaluation , 2010, INTERSPEECH.

[5]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[7]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Douglas A. Reynolds,et al.  The 2018 NIST Speaker Recognition Evaluation , 2019, INTERSPEECH.

[9]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[10]  Jia Liu,et al.  Local Pairwise Linear Discriminant Analysis for Speaker Verification , 2018, IEEE Signal Processing Letters.

[11]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Xiangyu Liu,et al.  Your Voice Assistant is Mine: How to Abuse Speakers to Steal Information and Control Your Phone , 2014, SPSM@CCS.

[13]  Mohammad Mehdi Homayounpour,et al.  Nonparametrically trained PLDA for short duration i-vector speaker verification , 2018, Comput. Speech Lang..

[14]  Josef Kittler,et al.  Locally linear discriminant analysis for multimodally distributed classes for face recognition with a single model image , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Seyed Omid Sadjadi,et al.  Nearest neighbor discriminant analysis for robust speaker recognition , 2014, INTERSPEECH.

[16]  Hua Yu,et al.  A direct LDA algorithm for high-dimensional data - with application to face recognition , 2001, Pattern Recognit..

[17]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Seyed Omid Sadjadi,et al.  The IBM 2016 Speaker Recognition System , 2016, Odyssey.

[19]  Ming Li,et al.  2D-LDA: A statistical linear discriminant analysis for image matrix , 2005, Pattern Recognit. Lett..

[20]  Georg Heigold,et al.  End-to-end text-dependent speaker verification , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Sanjeev Khudanpur,et al.  A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.