Relations between prominence and articulatory-prosodic cues in emotional speech

This study investigates the relations between the degree of prominence and articulatory-prosodic cues in emotional speech. In particular, this study considers articulatory parameters driven from the Converter/Distributor (C/D) model. The goal is to obtain a better understanding of the link among syllable magnitude in the C/D model, the empirical way to measure it in literature, and syllable-level prominence, and to examine emotional variations appearing in this relation. Since prosodic variations are important cues for prominence and emotion in speech, relations with prosodic parameters (f0, energy, duration) are also considered. Electromagnetic articulography data of two speakers were used for analysis. The degree of prominence was computed on crowd-sourcing annotation data, using the Rapid Prosody Transcription. Results indicate that movements of linguistically critical articulator, energy, syllable magnitude measure are highly correlated with prominence; f0 is relatively less correlated. The movements of linguistically critical articulator tend to be more correlated than syllable magnitude measure. Inter-speaker variability and emotion-dependent variations are also reported. These results suggest complex relations between prominence and articulatory-prosodic cues. They also suggest that incorporating more articulatory and prosodic behaviors than the conventional way can better relate to perception of prominence.

[1]  Mark Hasegawa-Johnson,et al.  Signal-based and expectation-based factors in the perception of prosodic prominence , 2010 .

[2]  Jennifer Cole,et al.  Naïve listeners' prominence and boundary perception , 2008, Speech Prosody 2008.

[3]  An articulatory account of rhythm , prominence , and phrasal organization , 2010 .

[4]  Osamu Fujimura,et al.  The C/D Model and Prosodic Control of Articulatory Behavior , 2000, Phonetica.

[5]  A. Fernald,et al.  Prosody and focus in speech to infants and adults , 1991 .

[6]  Donna Erickson,et al.  A study of invariant properties and variation patterns in the converter/distributor model for emotional speech , 2014, INTERSPEECH.

[7]  Panayiotis G. Georgiou,et al.  SailAlign: Robust long speech-text alignment , 2011 .

[8]  Y. Mo Prosody production and perception with conversational speech , 2010 .

[9]  Justine Cassell,et al.  Semantic and Discourse Information for Text-to-Speech Intonation , 1997, Workshop On Concept To Speech Generation Systems.

[10]  O. Fujimura,et al.  Neutralizing differences in jaw displacement for English vowels , 2013 .

[11]  Donna Erickson,et al.  Bridging articulation and perception: The C/D model and contrastive emphasis , 2015, ICPhS.

[12]  Patra S. Wagner Evaluating Metrical Phonology - a Computational- Empirical Approach , 2000, KONVENS.

[13]  Beckman,et al.  Phonological Structure and Phonetic Form: Articulatory evidence for differentiating stress categories , 1994 .

[14]  Petra Wagner,et al.  Prominence-Based Prosody Prediction for Unit Selection Speech Synthesis , 2011, INTERSPEECH.

[15]  J. Terken Fundamental frequency and perceived prominence of accented syllables. , 1991, The Journal of the Acoustical Society of America.

[16]  G. Fant,et al.  Speech , Music and Hearing Quarterly Progress and Status Report Preliminaries to the study of Swedish prose reading and reading style , 2007 .

[17]  Louis ten Bosch,et al.  Acoustical features as predictors for prominence in read aloud dutch sentences used in ANN's , 1999, EUROSPEECH.

[18]  W. Cooper,et al.  Speech intonation and focus location in matched statements and questions. , 1986, The Journal of the Acoustical Society of America.

[19]  Angelien Sanderman,et al.  On the perceptual strength of prosodic boundaries and its relation to suprasegmental cues , 1994 .