Modeling intensity contours and the interaction of pitch and intensity to improve automatic prosodic event detection and classification

Prosody, or the way words are spoken, carries important information to understanding a speaker's communicative intention. Many studies on automatic prosodic analysis focus on parameterizing pitch content. In this work, we extend previous pitch contour modeling features to intensity contours, and develop a set of features based on the interaction of pitch and intensity. These new features improve the state-of-the-art on all prosodic event detection and classification tasks related to automatic ToBI labeling.

[1]  S. Shattuck-Hufnagel,et al.  Perceptual Robustness of the Tonal Center of Gravity for Contour Classification , 2009 .

[2]  Julia Hirschberg,et al.  Discourse Structure in Spoken Language: Studies on Speech Corpora , 1995 .

[3]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[4]  Andrew Rosenberg,et al.  AutoBI - a tool for automatic toBI annotation , 2010, INTERSPEECH.

[5]  Steven Greenberg,et al.  PROSODIC STRESS REVISITED: REASSESSING THE ROLE OF FUNDAMENTAL FREQUENCY , 2000 .

[6]  Mari Ostendorf,et al.  Automatic labeling of prosodic patterns , 1994, IEEE Trans. Speech Audio Process..

[7]  Taniya Mishra,et al.  Word Prominence Detection using Robust yet Simple Prosodic Features , 2012, INTERSPEECH.

[8]  Mari Ostendorf,et al.  A Hierarchical Stochastic Model for Automatic Prediction of Prosodic Boundary Location , 1994, CL.

[9]  Bhuvana Ramabhadran,et al.  Discriminative training and unsupervised adaptation for labeling prosodic events with limited training data , 2010, INTERSPEECH.

[10]  Julia Hirschberg,et al.  Detecting pitch accent using pitch-corrected energy-based predictors , 2007, INTERSPEECH.

[11]  E. Zwicker Procedure for calculating loudnesss of temporally variable sounds. , 1977, The Journal of the Acoustical Society of America.

[12]  Andrew Rosenberg,et al.  Classifying Skewed Data: Importance Weighting to Optimize Average Recall , 2012, INTERSPEECH.

[13]  Julia Hirschberg,et al.  Detecting Pitch Accents at the Word, Syllable and Vowel Level , 2009, NAACL.

[14]  Xuejing Sun,et al.  Pitch accent prediction using ensemble machine learning , 2002, INTERSPEECH.

[15]  Yasemin Altun,et al.  Using Conditional Random Fields to Predict Pitch Accents in Conversational Speech , 2004, ACL.

[16]  Paul Taylor,et al.  The tilt intonation model , 1998, ICSLP.

[17]  Shrikanth S. Narayanan,et al.  An automatic prosody recognizer using a coupled multi-stream acoustic model and a syntactic-prosodic language model , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..