Environment adaptation and long term parameters in speaker identification

In this paper, we have integrated in a GMM based speaker identi cation system two di erent techniques: a) Maximum Likelihood Linear Regression (MLLR) transformation which adapts the system to the new environment based on modifying the continuous densities of the GMM mixtures. We apply the MLLR to perform environmental compensation by reducing a mismatch due to channel or additive noise e ects, b) Linear Discriminant Analysis (LDA) applied on sequences of acoustic vectors. LDA extracts, from these sequences, a set of discriminant parameters maximizing the class separability by designing a linear transformation. Previous works have shown that application of LDA to speech recognition problem increases performance of speech recognition system. We use this approach to extract features that are more invariant to non-speakers-related conditions such as handset types and channel e ects. Experiments are done on 45 speaker's Spidre database.

[1]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[2]  Mark J. F. Gales,et al.  Model-based techniques for noise robust speech recognition , 1995 .

[3]  D. A. Reynolds,et al.  The effects of handset variability on speaker recognition performance: experiments on the Switchboard corpus , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Philip C. Woodland,et al.  Speaker adaptation of HMMs using linear regression , 1994 .

[5]  Aaron E. Rosenberg,et al.  A fast algorithm for stochastic matching with application to robust speaker verification , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Hermann Ney,et al.  The RWTH large vocabulary continuous speech recognition system , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Mitch Weintraub,et al.  NONLINEAR DISCRIMINANT FEATURE EXTRACTION FOR ROBUST TEXT-INDEPENDENT SPEAKER RECOGNITION , 1997 .

[8]  Pierre Dumouchel,et al.  Experiments in constrained maximum likelihood extraction of temporal features for speech recognition , 1999, EUROSPEECH.

[9]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[10]  Thomas W. Parsons,et al.  Voice and Speech Processing , 1986 .