MLLR techniques for speaker recognition

Maximum-Likelihood Linear Regression (MLLR) and Constrained MLLR (CMLLR) have been recently used for feature extraction in speaker recognition. These systems use (C)MLLR transforms as features that are modeled with Support Vector Machines (SVM). This paper evaluates and compares several of these approaches for the NIST Speaker Recognition task. Single CMLLR and up to 4-phonetic-class MLLR transforms are explored using Gaussian Mixture Models (GMM) and large-vocabulary speech recognition Hidden Markov Models (HMM), using both speaker recognition and speech recognition cepstral front-ends and normalizations. Results for the individual systems as well as in combination with two standard cep-stral systems are provided. Relative gains of 3% and 12% were obtained when combining the best performing CMLLR-based and MLLR-based systems with two standard cepstral systems, respectively.

[1]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[2]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[3]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[4]  Andreas Stolcke,et al.  Improvements in MLLR-Transform-based Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[5]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[6]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[7]  Cheung-Chi Leung,et al.  Constrained MLLR for Speaker Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  Andreas Stolcke,et al.  MLLR transforms as features in speaker recognition , 2005, INTERSPEECH.

[9]  Douglas A. Reynolds,et al.  Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[10]  Jean-Luc Gauvain,et al.  Speaker adaptation based on MAP estimation of HMM parameters , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Richard M. Schwartz,et al.  The 2004 BBN/LIMSI 20xRT English conversational telephone speech recognition system , 2005, INTERSPEECH.

[12]  Mark J. F. Gales,et al.  Mean and variance adaptation within the MLLR framework , 1996, Comput. Speech Lang..

[13]  David A. van Leeuwen,et al.  STBU System for the NIST 2006 Speaker Recognition Evaluation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[14]  Jean-Luc Gauvain,et al.  Feature and score normalization for speaker verification of cellular data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[15]  Francis Kubala,et al.  Fast Robust Inverse Transform SAT and Multi-stage Adaptation , 1998 .

[16]  William M. Campbell,et al.  Generalized linear discriminant sequence kernels for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.