Loquendo - Politecnico di Torino's 2010 NIST speaker recognition evaluation system

This paper describes the improvement introduced in the Loquendo-Politecnico di Torino (LPT) speaker recognition system submitted to the NIST SRE10 evaluation campaign. This system combines the results of eight core acoustic systems all based on Gaussian Mixture Models (GMMs). We illustrate the key factors, in the selection of the development data and in engineering state-of-the art technology, which contributed to the very good performance and calibration of our system in all the test conditions proposed in this evaluation.

[1]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[2]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[3]  Patrick Kenny,et al.  Disentangling speaker and channel effects in speaker verification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[5]  Sridha Sridharan,et al.  Modelling session variability in text-independent speaker verification , 2005, INTERSPEECH.

[6]  Patrick Kenny,et al.  New MAP estimators for speaker recognition , 2003, INTERSPEECH.

[7]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[8]  Douglas E. Sturim,et al.  Speaker adaptive cohort selection for Tnorm in text-independent speaker verification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[10]  Niko Brümmer,et al.  The speaker partitioning problem , 2010, Odyssey.

[11]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[12]  Douglas A. Reynolds,et al.  Analysis of multitarget detection for speaker and language recognition , 2004, Odyssey.

[13]  Moshe Wasserblat,et al.  How to Deal with Multiple-Targets in Speaker Identification Systems? , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[14]  Tsuhan Chen,et al.  Improved speaker verification through probabilistic subspace adaptation , 2003, INTERSPEECH.

[15]  Pietro Laface,et al.  Channel Factors Compensation in Model and Feature Domain for Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[16]  Roland Kuhn,et al.  Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[17]  Pietro Laface,et al.  Comparison of Speaker Recognition Approaches for Real Applications , 2011, INTERSPEECH.

[18]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[19]  Pietro Laface,et al.  Unsupervised segmentation and verification of multi-speaker conversational speech , 2005, INTERSPEECH.

[20]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[21]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Patrick Kenny,et al.  Feature normalization using smoothed mixture transformations , 2006, INTERSPEECH.

[23]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[25]  Douglas A. Reynolds,et al.  Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[26]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[27]  Pietro Laface,et al.  Compensation of Nuisance Factors for Speaker and Language Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.