Continuous Speech Recognition at LIMSI

This paper presents some of the recent research on speaker-independent continuous speech recognition at LIMSI including efforts in phone and word recognition for both French and English. Evaluation of an HMMbased phone recognizer on a subset of the BREF corpus, gives a phone accuracy of 67.1% with 35 context-independent phone models and 74.2% with 428 context-dependent phone models. The word accuracy is 88% for a 1139 word lexicon and 86% for a 2716 word lexicon, using a word pair grammar with respective perplexities of 101 and 160. Phone recognition is also shown to be effective for language, sex, and speaker identification. The second part of the paper describes the recognizer used for the September-92 Resource Management evaluation test. The HMM-based word recognizer is built by concatenation of the phone models for each word, where each phone model is a 3-state left-to-right HMM with Gaussian mixture observation densities. Separate male and female models are run in parallel. The lexicon is represented with a reduced set of 36 phones so as to permit additional sharing of contexts. Intraand inter-word phonological rules are optionally applied during training and recognition. These rules attempt to account for some of the phonological variations observed in fluent speech. The speaker-independent word accuracy on the Sep92 test data was 95.6%. On the previous test materials which were used for development, the word accuracies are: 96.7% (Jun88), 97.5% (Feb89), 96.7% (Oct89) and 97.4% (Feb91).

[1]  Jean-Luc Gauvain,et al.  Experiments on speaker-independent phone recognition using BREF , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Chin-Hui Lee,et al.  Bayesian learning for hidden Markov model with Gaussian mixture state observation densities , 1991, Speech Commun..

[3]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[4]  L. R. Rabiner,et al.  Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.

[5]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[6]  Chin-Hui Lee,et al.  Acoustic modeling for large vocabulary speech recognition , 1990 .

[7]  Maxine Eskénazi,et al.  BREF, a large vocabulary spoken corpus for French , 1991, EUROSPEECH.

[8]  Jean-Luc Gauvain,et al.  Speaker-Independent Phone Recognition Using BREF , 1992, HLT.

[9]  Jean-Luc Gauvain,et al.  A dynamic programming processor for speech recognition , 1989 .

[10]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[11]  J.-L. Gauvain,et al.  A syllable-based isolated word recognition experiment , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  K. Matrouf,et al.  Adapting probability-transitions in DP matching processing for an oral task-oriented dialogue , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[13]  Frank Fallside,et al.  A recurrent error propagation network speech recognition system , 1991 .

[14]  B.-H. Juang,et al.  Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains , 1985, AT&T Technical Journal.

[15]  Maxine Eskénazi,et al.  Design considerations and text selection for BREF, a large French read-speech corpus , 1990, ICSLP.

[16]  Aaron E. Rosenberg,et al.  Word juncture modeling using phonological rules for HMM-based continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.