Experiments in speaker verification using factor analysis likelihood ratios

We report the results of some speaker verification experiments on the NIST 1999 and NIST 2000 test sets using factor analysis likelihood ratio statistics. For the experiments on the 1999 test set we had to use a mismatched training set, namely Phases 1 and 2 of the Switchboard II corpus, to train the factor analysis model. Our results on this test set are are comparable to (but not better than) the best results that have been attained with standard methods (GMM likelihood ratios and handset detection). In order to experiment with well matched training and test sets, we used half of the target speakers in the NIST 2000 evaluation for testing and a disjoint set of speakers taken from Switchboard II, Phases 1 and 2 for training. In this situation we obtained an equal error rate of 7.2% and a minimum detection cost of 0.028. These figures represent an improvement of about 25% over standard methods.

[1]  Man-Wai Mak,et al.  Sun-Yuan Kung, Speaker Verification from Coded Telephone Speech Using Stochastic Feature Transformation and Handset Identification , 2002, IEEE Pacific Rim Conference on Multimedia.

[2]  Patrick Kenny,et al.  Disentangling speaker and channel effects in speaker verification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Douglas A. Reynolds,et al.  Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  Aaron E. Rosenberg,et al.  A fast algorithm for stochastic matching with application to robust speaker verification , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[6]  Douglas A. Reynolds,et al.  The NIST speaker recognition evaluation - Overview, methodology, systems, results, perspective , 2000, Speech Commun..

[7]  Patrick Kenny,et al.  New MAP estimators for speaker recognition , 2003, INTERSPEECH.

[8]  Patrick Kenny,et al.  Speaker adaptation using an eigenphone basis , 2004, IEEE Transactions on Speech and Audio Processing.

[9]  Thomas H. Crystal,et al.  Speaker Verification by Human Listeners: Experiments Comparing Human and Machine Performance Using the NIST 1998 Speaker Evaluation Data , 2000, Digit. Signal Process..

[10]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[11]  Ran D. Zilca Using second order statistics for text independent speaker verification , 2001, Odyssey.

[12]  Li Deng,et al.  A Bayesian approach to the verification problem: applications to speaker verification , 2001, IEEE Trans. Speech Audio Process..

[13]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.

[14]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[15]  Pierre Dumouchel,et al.  Environment adaptation and long term parameters in speaker identification , 1999, EUROSPEECH.

[16]  Baback Moghaddam,et al.  Principal Manifolds and Probabilistic Subspaces for Visual Recognition , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Larry P. Heck,et al.  Robust text-independent speaker identification over telephone channels , 1999, IEEE Trans. Speech Audio Process..

[18]  Larry P. Heck,et al.  A model-based transformational approach to robust speaker recognition , 2000, INTERSPEECH.

[19]  Wu Chou,et al.  Maximum a posterior linear regression with elliptically symmetric matrix variate priors , 1999, EUROSPEECH.

[20]  Alvin F. Martin,et al.  NIST's Assessment of Text Independent Speaker Recognition Performance , 2002 .

[21]  Gérard Chollet,et al.  The ELISA Systems for the NIST"99 Evaluation in Speaker Detection and Tracking , 1999 .