Robust speaking rate estimation using broad phonetic class recognition

Robust speaking rate estimation can be useful in automatic speech recognition and speaker identification, and accurate, automatic measures of speaking rate are also relevant for research in linguistics, psychology, and social sciences. In this study we built a broad phonetic class recognizer for speaking rate estimation. We tested the recognizer on a variety of data sets, including laboratory speech, telephone conversations, foreign accented speech, and speech in different languages, and we found that the recognizer's estimates are robust under these sources of variation. We also found that the acoustic models of the broad phonetic classes are more robust than those of the monophones for syllable detection.

[1]  Daniel P. W. Ellis,et al.  Using Broad Phonetic Group Experts for Improved Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Partha Niyogi,et al.  Robust acoustic-based syllable detection , 2006, INTERSPEECH.

[3]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[4]  James R. Glass,et al.  Heterogeneous acoustic measurements for phonetic classification 1 , 1997, EUROSPEECH.

[5]  Shrikanth S. Narayanan,et al.  Robust Speech Rate Estimation for Spontaneous Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Mark Liberman,et al.  Speaker identification on the SCOTUS corpus , 2008 .

[7]  P. Mermelstein Automatic segmentation of speech into syllabic units. , 1975, The Journal of the Acoustical Society of America.

[8]  James R. Glass,et al.  HETEROGENEOUS ACOUSTIC MEASUREMENTS FOR PHONETIC CLASSIFICATION , 1997 .

[9]  Younès Bennani Speaker identification , 1998 .

[10]  Mark Liberman,et al.  Towards an integrated understanding of speaking rate in conversation , 2006, INTERSPEECH.

[11]  Eric Fosler-Lussier,et al.  Combining multiple estimators of speaking rate , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[12]  Kenneth N. Stevens,et al.  Automatic syllable detection for vowel landmarks , 2000 .

[13]  James R. Glass,et al.  Speech rhythm guided syllable nuclei detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  References , 1971 .

[15]  Tara N. Sainath,et al.  A comparison of broad phonetic and acoustic units for noise robust segment-based phonetic recognition , 2008, INTERSPEECH.

[16]  Thilo Pfau,et al.  Estimating the speaking rate by vowel detection , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[17]  Eric Fosler-Lussier,et al.  Speech recognition using on-line estimation of speaking rate , 1997, EUROSPEECH.

[18]  Jérôme Farinas,et al.  Automatic Estimation of Speaking Rate in Multilingual Spontaneous Speech , 2004 .

[19]  W. S. Brown,et al.  Speaking rate and fundamental frequency as speech cues to perceived age. , 2008, Journal of voice : official journal of the Voice Foundation.