Influence of task duration in text-independent speaker verification

Short duration tasks for text-independent speaker verification have received relatively little attention when compared to that directed at tasks involving many minutes of speech. In this paper we investigate verification performance on a range of durations from a few seconds to a few minutes. We begin with a state-of-the-art GMM-based system operating on a few minutes of speech per person and show that the same system is suboptimal on short (10 seconds) speech recordings. In particular we highlight that optimal frame selection exhibits a dependency on overall duration. This work sheds some light on the difficulties of transposing recent and important techniques such as SVMNAP to the short duration tasks.

[1]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[2]  Alvin F. Martin,et al.  NIST Speaker Recognition Evaluation Chronicles - Part 2 , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[3]  Jérôme Louradour,et al.  Discriminative power of transient frames in speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4]  Daniel Povey,et al.  Secondary Classification for GMM Based Speaker Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Douglas A. Reynolds,et al.  The SuperSID project: exploiting high-level information for high-accuracy speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[6]  Patrick Kenny,et al.  Factor analysis simplified [speaker verification applications] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  John S. D. Mason,et al.  Phoneme performance in speaker recognition , 1992, ICSLP.

[8]  Jason W. Pelecanos,et al.  Compensation of utterance length for speaker verification , 2004, Odyssey.

[9]  Sridha Sridharan,et al.  Experiments in Session Variability Modelling for Speaker Verification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  Jean-François Bonastre,et al.  Localization and selection of speaker-specific information with statistical modeling , 2000, Speech Commun..

[11]  William M. Campbell,et al.  Estimating and evaluating confidence for forensic speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[12]  William M. Campbell,et al.  Advances in channel compensation for SVM speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..