Combining multiple estimators of speaking rate

We report progress in the development of a measure of speaking rate that is computed from the acoustic signal. The newest form of our analysis incorporates multiple estimates of rate; besides the spectral moment for a full-band energy envelope that we have previously reported, we also used pointwise correlation between pairs of compressed sub-band energy envelopes. The complete measure, called mrate, has been compared to a reference syllable rate derived from a manually transcribed subset of the Switchboard database. The correlation with transcribed syllable rate is significantly higher than our earlier measure; estimates are typically within 1-2 syllables/second of the reference syllable rate. We conclude by assessing the use of mrate as a detector for rapid speech.

[1]  B Gold,et al.  Parallel processing techniques for estimating pitch periods of speech in the time domain. , 1969, The Journal of the Acoustical Society of America.

[2]  Richard M. Stern,et al.  On the effects of speech rate in large vocabulary speech recognition systems , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Jean-Pierre Martens,et al.  A fast and reliable rate of speech detector , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Eric Fosler-Lussier,et al.  Towards robustness to fast speech in ASR , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Satoshi Kobayashi,et al.  Extraction and representation rhythmic components of spontaneous speech , 1997, EUROSPEECH.

[6]  Eric Fosler-Lussier,et al.  Speech recognition using on-line estimation of speaking rate , 1997, EUROSPEECH.