Experiments in SVM-based Speaker Verification Using Short Utterances

This paper investigates the effects of limited speech data in the context of speaker verification using the Gaussian mixture model (GMM) mean supervector support vector machine (SVM) classifier. This classifier provides state-of-the-art performance when sufficient speech is available, however, its robustness to the effects of limited speech resources has not yet been ascertained. Verification performance is analysed with regards to the duration of impostor utterances used for background, score normalisation and session compensation training cohorts. Results highlight the importance of matching the speech duration of utterances in these cohorts to the expected evaluation conditions. Performance was shown to be particularly sensitive to the utterance duration of examples in the background dataset. It was also found that the nuisance attribute projection (NAP) approach to session compensation often degrades performance when both training and testing data are limited. An analysis of the session and speaker variability in the mean supervector space provides some insight into the cause of this phenomenon.

[1]  Driss Matrouf,et al.  Applying SVMs and weight-based factor analysis to unsupervised adaptation for speaker verification , 2011, Comput. Speech Lang..

[2]  Sridha Sridharan,et al.  Factor analysis modelling for speaker verification with short utterances , 2008, Odyssey.

[3]  William M. Campbell,et al.  Advances in channel compensation for SVM speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Sridha Sridharan,et al.  Factor analysis subspace estimation for speaker verification with short utterances , 2008, INTERSPEECH.

[6]  Man-Wai Mak,et al.  A Comparison of Various Adaptation Methods for Speaker Verification With Limited Enrollment Data , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Sridha Sridharan,et al.  Data-Driven Background Dataset Selection for SVM-Based Speaker Verification , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[10]  Sridha Sridharan,et al.  A comparison of session variability compensation techniques for SVM-based speaker recognition , 2007, INTERSPEECH.

[11]  Sridha Sridharan,et al.  QUT Speaker Identity Verification system for EVALITA 2009 , 2010, 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010).

[12]  Sridha Sridharan,et al.  Within-session variability modelling for factor analysis speaker verification , 2009, INTERSPEECH.

[13]  Sridha Sridharan,et al.  Experiments in Session Variability Modelling for Speaker Verification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[14]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[15]  Sridha Sridharan,et al.  Exploiting multiple feature sets in data-driven impostor dataset selection for speaker verification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[17]  Sridha Sridharan,et al.  Scatter Difference NAP for SVM Speaker Recognition , 2009, ICB.