Lattice-based MLLR for speaker recognition

Maximum-Likelihod Linear Regression (MLLR) transform coefficients have shown to be useful features for text-independent speaker recognition systems. These use MLLR coefficients computed on a Large Vocabulary Continuous Speech Recognition System (LVCSR) as features and Support Vector machines(SVM) classification. However, performance is limited by transcripts, which are often erroneous with high word error rates (WER) for spontaneous telephone speech applications. In this paper, we propose using lattice-based MLLR to overcome this issue. Using wordlattices instead of 1-best hypotheses, more hypotheses can be considered for MLLR estimation and, thus, better models are more likely to be used. As opposed to standard MLLR, language model probabilities are taken into account as well. We show how systems using lattice MLLR outperform standard MLLR systems in the Speaker Recognition Evaluation (SRE) 2006. Comparison to other standard acoustic systems is provided as well.

[1]  Andreas Stolcke,et al.  MLLR transforms as features in speaker recognition , 2005, INTERSPEECH.

[2]  Driss Matrouf,et al.  A straightforward and efficient implementation of the factor analysis model for speaker verification , 2007, INTERSPEECH.

[3]  Vassilios Digalakis,et al.  Speaker adaptation using constrained estimation of Gaussian mixtures , 1995, IEEE Trans. Speech Audio Process..

[4]  William M. Campbell,et al.  Generalized linear discriminant sequence kernels for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Richard M. Schwartz,et al.  The 2004 BBN/LIMSI 20xRT English conversational telephone speech recognition system , 2005, INTERSPEECH.

[6]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[7]  Cheung-Chi Leung,et al.  Constrained MLLR for Speaker Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  H. Hermansky,et al.  Analysis of Speaker and Channel Variability in , 1999 .

[9]  Philip C. Woodland,et al.  Improvements in linear transform based speaker adaptation , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[11]  Andreas Stolcke,et al.  Improvements in MLLR-Transform-based Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[12]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[13]  Cheung-Chi Leung,et al.  MLLR techniques for speaker recognition , 2008, Odyssey.

[14]  Sreeram V. Balakrishnan Fast incremental adaptation using maximum likelihood regression and stochastic gradient descent , 2003, INTERSPEECH.

[15]  Geoffrey Zweig,et al.  LATTICE-BASED UNSUPERVISED MLLR FOR SPEAKER ADAPTATION , 2000 .