Fast adaptation of GMM-based compact models

In this paper, a new strategy for a fast adaptation of acoustic models is proposed for embedded speech recognition. It relies on a general GMM, which represents the whole acoustic space, associated with a set of HMM state-dependent probability functions modeled as transformations of this GMM. The work presented here takes advantage of this architecture to propose a fast and efficient way to adapt the acoustic models. The adaptation is performed only on the general GMM model, using techniques gathered from the speaker recognition domain. It does not require state-dependent adaptation data and it is very efficient in terms of computational cost. Weevaluate our approach in the voice-command task, using a car-based corpus. This adaptation method achieved a relative error-rate decrease of about 10% even if few adaptation data are available. The complete system allows a total relative gain of more than 20% compared to a basic HMM-based system. Index Terms: speech recognition, compact acoustic models, adaptation

[1]  Daniel Povey,et al.  Frame discrimination training for HMMs for large vocabulary speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[2]  Georges Linarès,et al.  GMM-based acoustic modeling for embedded speech recognition , 2006, INTERSPEECH.

[3]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[4]  李幼升,et al.  Ph , 1989 .

[5]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[6]  Olivier Bellot Adaptation au locuteur des modèles acoustiques dans le cadre de la reconnaissance automatique de la parole , 2006 .

[7]  Maxine Eskénazi,et al.  BREF, a large vocabulary spoken corpus for French , 1991, EUROSPEECH.

[8]  Philip C. Woodland,et al.  Speaker adaptation of continuous density HMMs using multivariate linear regression , 1994, ICSLP.

[9]  Georges Linarès,et al.  Embedded Mobile Phone Digit-Recognition , 2007 .

[10]  Xuedong Huang,et al.  On semi-continuous hidden Markov modeling , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[11]  Georges Linarès,et al.  Structural linear model-space transformations for speaker adaptation , 2003, INTERSPEECH.

[12]  Petra Geutner,et al.  VODIS - voice-operated driver information systems: a usability study on advanced speech technologies for car environments , 2000, INTERSPEECH.