The UMD-JHU 2011 speaker recognition system

In recent years, there have been significant advances in the field of speaker recognition that has resulted in very robust recognition systems. The primary focus of many recent developments have shifted to the problem of recognizing speakers in adverse conditions, e.g in the presence of noise/reverberation. In this paper, we present the UMD-JHU speaker recognition system applied on the NIST 2010 SRE task. The novel aspects of our systems are: 1) Improved performance on trials involving different vocal effort via the use of linear-scale features; 2) Expected improved recognition performance in the presence of reverberation and noise via the use of frequency domain perceptual linear predictor and cortical features; 3) A new discriminative kernel partial least squares (KPLS) framework that complements state-of-the-art back-end systems JFA and PLDA to aid in better overall recognition; and 4) Acceleration of JFA, PLDA and KPLS back-ends via distributed computing. The individual components of the system and the fused system are compared against a baseline JFA system and results reported by SRI and MIT-LL on SRE2010.

[1]  Douglas E. Sturim,et al.  The MIT LL 2010 speaker recognition evaluation system: Scalable language-independent speaker recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Hynek Hermansky,et al.  Multi-layer perceptron based speech activity detection for speaker verification , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[3]  Balaji Vasan Srinivasan,et al.  Kernel Partial Least Squares for Speaker Recognition , 2011, INTERSPEECH.

[4]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[5]  Andreas Stolcke,et al.  The SRI NIST 2010 speaker recognition evaluation system , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Daniel Garcia-Romero,et al.  Linear versus mel frequency cepstral coefficients for speaker recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[7]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Mohamed Kamal Omar,et al.  Feature normalization for speaker verification in room reverberation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Andreas Stolcke,et al.  THE SRI NIST 2008 speaker recognition evaluation system , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[11]  Patrick Kenny,et al.  Speaker and Session Variability in GMM-Based Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Balaji Vasan Srinivasan,et al.  A partial least squares framework for speaker recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Ramani Duraiswami,et al.  Neuromimetic Sound Representation for Percept Detection and Manipulation , 2005, EURASIP J. Adv. Signal Process..

[14]  Daniel Garcia-Romero,et al.  Joint Factor Analysis for Speaker Recognition Reinterpreted as Signal Coding Using Overcomplete Dictionaries , 2010, Odyssey.

[15]  Hynek Hermansky,et al.  Front-end for far-field speech recognition based on frequency domain linear prediction , 2008, INTERSPEECH.

[16]  Sridhar Krishna Nemala,et al.  Biomimetic multi-resolution analysis for robust speaker recognition , 2012, EURASIP J. Audio Speech Music. Process..

[17]  Lukás Burget,et al.  Comparison of scoring methods used in speaker recognition with Joint Factor Analysis , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..