Compact Acoustic Models for Embedded Speech Recognition

Speech recognition applications are known to require a significant amount of resources. However, embedded speech recognition only authorizes few KB of memory, few MIPS, and small amount of training data. In order to fit the resource constraints of embedded applications, an approach based on a semicontinuous HMM system using state-independent acoustic modelling is proposed. A transformation is computed and applied to the global model in order to obtain each HMM state-dependent probability density functions, authorizing to store only the transformation parameters. This approach is evaluated on two tasks: digit and voice-command recognition. A fast adaptation technique of acoustic models is also proposed. In order to significantly reduce computational costs, the adaptation is performed only on the global model (using related speaker recognition adaptation techniques) with no need for state-dependent data. The whole approach results in a relative gain of more than 20% compared to a basic HMM-based system fitting the constraints.

[1]  Mark J. F. Gales,et al.  Robust speech recognition in additive and convolutional noise using parallel model combination , 1995, Comput. Speech Lang..

[2]  Brian Kan-Wing Mak,et al.  Subspace distribution clustering hidden Markov model , 2001, IEEE Trans. Speech Audio Process..

[3]  Daniel Povey,et al.  Large scale discriminative training for speech recognition , 2000 .

[4]  Stephen E. Levinson,et al.  Speaker independent isolated digit recognition using hidden Markov models , 1982, ICASSP.

[5]  Philip C. Woodland,et al.  Speaker adaptation of continuous density HMMs using multivariate linear regression , 1994, ICSLP.

[6]  Petra Geutner,et al.  VODIS - voice-operated driver information systems: a usability study on advanced speech technologies for car environments , 2000, INTERSPEECH.

[7]  Steve J. Young,et al.  Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[9]  Georges Linarès,et al.  Structural linear model-space transformations for speaker adaptation , 2003, INTERSPEECH.

[10]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Hanseok Ko,et al.  Compact acoustic model for embedded implementation , 2004, INTERSPEECH.

[12]  John E. Shore,et al.  Discrete utterance speech recognition without time alignment , 1983, IEEE Trans. Inf. Theory.

[13]  Hermann Ney,et al.  Acoustic feature combination for robust speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[14]  Hanseok Ko,et al.  Achieving a reliable compact acoustic model for embedded speech recognition system with high confusion frequency model handling , 2006, Speech Commun..

[15]  Mei-Yuh Hwang,et al.  Shared-distribution hidden Markov models for speech recognition , 1993, IEEE Trans. Speech Audio Process..

[16]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[17]  Arnon D. Cohen,et al.  Comparison of continuous-density and semi-continuous HMM in isolated words recognition systems , 1999, EUROSPEECH.

[18]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[19]  Hermann Ney,et al.  Large vocabulary continuous speech recognition using word graphs , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[20]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  M. Eskenazi,et al.  The French language database: Defining, planning, and recording a large database , 1984, ICASSP.

[22]  Maxine Eskénazi,et al.  BREF, a large vocabulary spoken corpus for French , 1991, EUROSPEECH.

[23]  Roberto Billi,et al.  Vector quantization and Markov source models applied to speech recognition , 1982, ICASSP.

[24]  A. Poritz,et al.  On hidden Markov models in isolated word recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[26]  Xiaodong He,et al.  Robust feature space adaptation for telephony speech recognition , 2006, INTERSPEECH.

[27]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[28]  Steve Young,et al.  The general use of tying in phoneme-based HMM speech recognisers , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[30]  Mei-Yuh Hwang,et al.  Deleted interpolation and density sharing for continuous hidden Markov models , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[31]  Xuedong Huang,et al.  Large-vocabulary speaker-independent continuous speech recognition with semi-continuous hidden Markov models , 1989, EUROSPEECH.

[32]  Javier Macías Guarasa,et al.  Initial evaluation of a preselection module for a flexible large vocabulary speech recognition system in telephone environment , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[33]  Olivier Bellot Adaptation au locuteur des modèles acoustiques dans le cadre de la reconnaissance automatique de la parole , 2006 .