i-Vector Selection for Effective PLDA Modeling in Speaker Recognition

Data selection is an important issue in speaker recognition. In previous studies, the data selection for universal background model (UBM) training and for the background dataset of support vector machines (SVM) have been addressed. In this paper, we address the data selection for a probabilistic linear discriminant analysis (PLDA) model which is one of the state-of-the-art methods for i-vector scoring. We first show that the data selection using the conventional k-NN method indeed improves the speaker verification performance. We then propose a robust way of selecting k by using a local distance-based outlier factor (LDOF). We name our method as flexible k-NN or fkNN. Our fk-NN obtained significant performance improvements on both male and female trials of the NIST speaker recognition evaluation (SRE) 2006 core task, NIST SRE 2008 core task (condition-6) and NIST SRE 2010 coreext-coreext task (condition-5).

[1]  Yun Lei,et al.  Effective background data selection in SVM speaker recognition for unseen test environment: More is not always better , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Sridha Sridharan,et al.  PLDA based speaker recognition on short utterances , 2012, Odyssey.

[3]  Sridha Sridharan,et al.  Data-Driven Background Dataset Selection for SVM-Based Speaker Verification , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  M. Mak,et al.  Robust Voice Activity Detection for Interview Speech in NIST Speaker Recognition Evaluation , 2010 .

[5]  John H. L. Hansen,et al.  A Study on Universal Background Model Training in Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[9]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[10]  Patrick Kenny,et al.  Mixture of PLDA Models in i-vector Space for Gender-Independent Speaker Recognition , 2011, INTERSPEECH.

[11]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[12]  Ke Zhang,et al.  A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data , 2009, PAKDD.

[13]  Bin Ma,et al.  Maximum Entropy Based Data Selection for Speaker Recognition , 2011, INTERSPEECH.