An investigation of modelling aspects for ratedependent speech recognition

For the modelling of speech rate variation in speech recognition many approaches have been suggested. However, the training of speech-rate dependent models has by far received most of the attention. In order to investigate problematic a spects related with the classification of the speech data whic h represents one of the major problems of these approaches extensive experiments were carried out on a German corpus of read speech. The results indicate that while the kind of the modeldriven speech-rate measure is only of minor importance a datadriven classification of the speech data significantly impro ves the performance of rate-dependent models. Further results suggest a detailed modelling of speech rate based on more general models. This means that it might be possible to model speech rate adaptation by means of a transformation based on a continuous measure.

[1]  Daniel Tapias Merino,et al.  Characteristics of slow, average and fast speech and their effects in large vocabulary continuous speech recognition , 1997, EUROSPEECH.

[2]  Gernot A. Fink Developing HMM-Based Recognizers with ESMERALDA , 1999, TSD.

[3]  Gernot A. Fink,et al.  Influence of duration on static and dynamic properties of German vowels in spontaneous speech , 2000, INTERSPEECH.

[4]  B. Lindblom Spectrographic Study of Vowel Reduction , 1963 .

[5]  Pierre Delattre,et al.  An Acoustic and Articulatory Study of Vowel Reduction in Four Languages. , 1969 .

[6]  M. Fourakis,et al.  Tempo, stress, and vowel reduction in American English. , 1991, The Journal of the Acoustical Society of America.

[7]  Jean-Pierre Martens,et al.  A fast and reliable rate of speech detector , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  B. Lindblom,et al.  Interaction between duration, context, and speaking style in English stressed vowels , 1994 .

[9]  Thilo Pfau,et al.  Creating hidden Markov models for fast speech by optimized clustering , 1999, EUROSPEECH.

[10]  Eric Fosler-Lussier,et al.  Fast speakers in large vocabulary continuous speech recognition: analysis & antidotes , 1995, EUROSPEECH.

[11]  Thilo Pfau,et al.  Creating hidden Markov models for fast speech , 1998, ICSLP.

[12]  Daniel Tapias Merino,et al.  Towards speech rate independence in large vocabulary continuous speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[13]  Steve J. Young,et al.  Modelling speaking rate using a between frame distance metric , 1999, EUROSPEECH.

[14]  Mei-Yuh Hwang,et al.  Improvements on speech recognition for fast talkers , 1999, EUROSPEECH.

[15]  Eric Fosler-Lussier,et al.  Speech recognition using on-line estimation of speaking rate , 1997, EUROSPEECH.