Improving continuous speech recognition in Spanish by phone-class semicontinuous HMMs with pausing and multiple pronunciations

Abstract This paper presents a comprehensive study of continuous speech recognition in Spanish. It shows the use and optimisation of several well-known techniques together with the application for the first time to Spanish of language specific knowledge to these systems, i.e. the careful selection of the phone inventory, the phone-classes used, and the selection of alternative pronunciation rules. We have developed a semicontinuous phone-class dependent contextual modelling. Using four phone-classes, we have obtained recognition error rate reductions roughly equivalent to the percentage increase of the number of parameters, compared to baseline semicontinuous contextual modelling. We also show that the use of pausing in the training system and multiple pronunciations in the vocabulary help to improve recognition rates significantly. The actual pausing of the training sentences and the application of assimilation effects improve the transcription into context-dependent units. Multiple pronunciation possibilities are generated using general rules that are easily applied to any Spanish vocabulary. With all these ideas we have reduced the recognition errors of the baseline system by more than 30% in a task parallel to DARPA-RM translated into Spanish with a vocabulary of 979 words. Our database contains four speakers with 600 training sentences and 100 testing sentences each. All experiments have been carried out with a perplexity of 979, and even slightly higher in the case of multiple pronunciations, to be able to study the acoustic modelling power of the systems with no grammar constraints.

[1]  X. Huang E Alleva,et al.  Improving Speech Recognition Performance via Phone-Dependent VQ Codebooks and Adaptive Language Models in SPHINX-I1 M. Hwang R. Rosenfeld E. Thayer R. Mosur L. Chase R. Weide , 1994 .

[2]  Javier Ferreiros,et al.  Continuous Speech HMM Training System: Applications to Speech Recognition and Phonetic Label Alignment , 1995 .

[3]  Hermann Ney,et al.  The use of a one-stage dynamic programming algorithm for connected word recognition , 1984 .

[4]  Kai-Fu Lee,et al.  On large-vocabulary speaker-independent continuous speech recognition , 1988, Speech Commun..

[5]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[6]  Richard M. Stern,et al.  THE DEVELOPMENT OF THE 1997 CMU SPANISH BROADCAST NEWS TRANSCRIPTION SYSTEM , 1997 .

[7]  John Cotton,et al.  Introductory statistics. 3rd ed. , 1978 .

[8]  Xuedong Huang,et al.  Large-vocabulary speaker-independent continuous speech recognition with semi-continuous hidden Markov models , 1989, EUROSPEECH.

[9]  José Manuel Pardo,et al.  Large vocabulary speaker-independent isolated-word speech recognition using hidden Markov models: status report and planned research , 1989, EUROSPEECH.

[10]  Antonio José Rubio Ayuso,et al.  Speech Recognition and Coding: New Advances and Trends , 1995 .

[11]  Hy Murveit,et al.  Linguistic constraints in hidden Markov model based speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[12]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[13]  Xuedong Huang,et al.  Unified techniques for vector quantization and hidden Markov modeling using semi-continuous models , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[14]  Antonio M. Peinado,et al.  Using multiple vector quantization and semicontinuous hidden Markov models for speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Michael D. Brown,et al.  An algorithm for connected word recognition , 1982, ICASSP.

[16]  Mei-Yuh Hwang,et al.  Modeling between-word coarticulation in continuous speech recognition , 1989, EUROSPEECH.

[17]  Javier Ferreiros,et al.  INTRODUCING MULTIPLE PRONUNCIATIONS IN SPANISH SPEECH RECOGNITION SYSTEMS , 1998 .

[18]  Mei-Yuh Hwang,et al.  Improving speech recognition performance via phone-dependent VQ codebooks and adaptive language models in SPHINX-II , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[20]  José Manuel Pardo,et al.  Phonetic properties of a large Spanish lexicon and its implications for large vocabulary speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[21]  Javier Ferreiros,et al.  Preliminary experimentation of different methods for continuous speech recognition in Spanish , 1995, EUROSPEECH.