The SRI NIST 2010 speaker recognition evaluation system

The SRI speaker recognition system for the 2010 NIST speaker recognition evaluation (SRE) incorporates multiple subsystems with a variety of features and modeling techniques. We describe our strategy for this year's evaluation, from the use of speech recognition and speech segmentation to the individual system descriptions as well as the final combination. Our results show that under most conditions, the cepstral systems tend to perform the best, but that other, non-cepstral systems have the most complementarity. The combination of several subsystems with the use of adequate side information gives a 35% improvement on the standard telephone condition. We also show that a constrained cepstral system based on nasal syllables tends to be more robust to vocal effort variabilities.

[1]  Elizabeth Shriberg,et al.  A comparison of approaches for modeling prosodic features in speaker recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Andreas Stolcke,et al.  Improvements in MLLR-Transform-based Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[3]  Patrick Kenny,et al.  Modeling Prosodic Features With Joint Factor Analysis for Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Lukás Burget,et al.  Investigations into prosodic syllable contour features for speaker recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Pietro Laface,et al.  Loquendo - Politecnico di Torino's 2008 NIST speaker recognition evaluation system , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Elizabeth Shriberg,et al.  System combination using auxiliary information for speaker verification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Elizabeth Shriberg,et al.  Speaker recognition using syllable-based constraints for cepstral frame selection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Lukás Burget,et al.  Comparison of scoring methods used in speaker recognition with Joint Factor Analysis , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.