论文信息 - Utterance partitioning with acoustic vector resampling for i-vector based speaker verification

Utterance partitioning with acoustic vector resampling for i-vector based speaker verification

I-vector has become a state-of-the-art technique for textindependent speaker verification. The major advantage of ivectors is that they can represent speaker-dependent information in a low-dimension Euclidean space, which opens up opportunity for using statistical techniques to suppress sessionand channel-variability. This paper investigates the effect of varying the conversation length and the number of training sessions per speakers on the discriminative ability of i-vectors. The paper demonstrates that the amount of speaker-dependent information that an i-vector can capture will become saturated when the utterance length exceeds a certain threshold. This finding motivates us to maximize the feature representation capability of i-vectors by partitioning a long conversation into a number of sub-utterances in order to produce more i-vectors per conversation. Results on NIST 2010 SRE suggest that (1) using more i-vectors per conversation enhances the capability of LDA and WCCN in suppressing session variability, especially when the number of conversations per training speaker is limited; and (2) increasing the number of i-vectors per target speaker helps the i-vector based SVMs to find better decision boundaries, thus making SVM scoring outperforms cosine distance scoring by 22% and 9% in terms of minimum normalized DCF and EER.

Man-Wai Mak | Wei Rao

[1] Andreas Stolcke,et al. Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[2] Patrick Kenny,et al. Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3] Douglas E. Sturim,et al. SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4] Patrick Kenny,et al. Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.

[5] Patrick Kenny,et al. Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[6] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[7] Man-Wai Mak,et al. Utterance partitioning with acoustic vector resampling for GMM-SVM speaker verification , 2011, Speech Commun..

[8] Man-Wai Mak,et al. Addressing the Data-Imbalance Problem in Kernel-Based Speaker Verification via Utterance Partitioning and Speaker Comparison , 2011, INTERSPEECH.

[9] B. Efron,et al. A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[10] Sridha Sridharan,et al. Feature warping for robust speaker verification , 2001, Odyssey.

[11] B. Atal. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[12] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[13] Roland Auckenthaler,et al. Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[14] Mitchell McLaren,et al. Source-normalised LDA for robust speaker recognition using i-vectors , 2011 .

[15] Man-Wai Mak,et al. Comparison of Voice Activity Detectors for Interview Speech in NIST Speaker Recognition Evaluation , 2011, INTERSPEECH.