Acoustic characteristics of lexical stress in continuous telephone speech

Abstract In this paper we investigate acoustic differences between vowels in syllables that do or do not carry lexical stress. In doing so, we concentrated on segmental acoustic phonetic features that are conventionally assumed to differ between stressed and unstressed syllables, viz. Duration, Energy and Spectral Tilt. The speech material in this study differs from the type of material used in previous research: instead of specially constructed sentences we used phonetically rich sentences from the Dutch POLYPHONE corpus. Most of the Duration, Energy and Spectral Tilt features that we used in the investigation show statistically significant differences for the population means of stressed and unstressed vowels. However, it also appears that the distributions overlap to such an extent that automatic detection of stressed and unstressed syllables yields correct classifications of 72.6% at best. It is argued that this result is due to the large variety in the ways in which the abstract linguistic feature `lexical stress' is realized in the acoustic speech signal. Our findings suggest that a lexical stress detector has little use for a single pass decoder in an automatic speech recognition (ASR) system, but could still play a useful role as an additional knowledge source in a multi-pass decoder.

[1]  Lou Boves,et al.  Using lexical stress in continuous speech recognition for Dutch , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2]  Ilse Lehiste,et al.  Vowel Amplitude and Phonemic Stress in American English , 1959 .

[3]  Dick R. van Bergem,et al.  Acoustic vowel reduction as a function of sentence accent, word stress, and word class , 1993, Speech Commun..

[4]  Xue Wang,et al.  Analysis of context-dependent segmental duration for automatic speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  M. Radeau,et al.  The uniqueness point effect in the shadowing of spoken words , 1989, Speech Commun..

[6]  Colin W. Wightman Automatic detection of prosodic constituents for parsing , 1992 .

[7]  Alex Waibel,et al.  Recognition of lexical stress in a continuous speech understanding system - A pattern recognition approach , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Lou Boves,et al.  The Dutch polyphone corpus , 1995, EUROSPEECH.

[9]  Agaath M. C. Sluijter,et al.  Spectral balance as an acoustic correlate of linguistic stress. , 1996, The Journal of the Acoustical Society of America.

[10]  J. Jiang,et al.  Vocal fold physiology. , 2000, Otolaryngologic clinics of North America.

[11]  R. H. Baayen,et al.  The CELEX Lexical Database (CD-ROM) , 1996 .

[12]  Gerard Kempen,et al.  Funktiewoorden - een Inventarisatie voor het Nederlands - An Inventory of Dutch Function Words , 1980 .

[13]  Ruxin Chen,et al.  Lexical stress detection on stress-minimal word pairs , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[14]  Douglas D. O'Shaughnessy,et al.  Prosody and continuous speech recognition , 1993, EUROSPEECH.