The Relation Between Stress Accent and Vocalic Identity in Spontaneous American English Discourse

There is a systematic relationship between stress accent and vocalic identity in spontaneous English discourse (the Switchboard corpus composed of telephone dialogues). Low vowels are much more likely to be fully accented than their high vocalic counterparts. And conversely, high vowels are far more likely to lack stress accent than low or mid vocalic segments. Such patterns imply that stress accent and vocalic identity (particularly vowel height) are bound together at some level of lexical representation. Statistical analysis of a manually annotated corpus (Switchboard) indicates that vocalic duration is likely to serve as an important acoustic cue for stress accent, particularly for diphthongs and the low, tense monophthongs. In addition , multilayer perceptrons (MLPs) were trained on a portion of this annotated material in order to automatically label the corpus with respect to stress accent. The automatically derived labels are highly concordant with those of human transcribers (79% concordance within a quarter-step of accent level and 97.5% concordant within a half-step of accent level). In order to achieve such a high degree of concordance it is necessary to include features pertaining not only to the duration and amplitude of the vocalic nuclei, but also those associated with speaker gender, syllabic duration and most importantly, vocalic identity. Such results suggest that vocalic identity is intimately associated with stress accent in spontaneous American English (and vice versa), thereby providing a potential foundation with which to model pronunciation variation for automatic speech recognition.