论文信息 - Word Prominence Detection using Robust yet Simple Prosodic Features

Word Prominence Detection using Robust yet Simple Prosodic Features

Automatic detection of word prominence can provide valuable information for downstream applications such as spoken language understanding. Prior work on automatic word prominence detection exploit a variety of lexical, syntactic, and prosodic features and model the task as a sequence labeling problem (independently or using context). While lexical and syntactic features are highly correlated with the notion of word prominence, the output of speech recognition is typically noisy and hence these features are less reliable than the acousticprosodic feature stream. In this work, we address the automatic detection of word prominence through novel prosodic features that capture the changes in F0 curve shape and magnitude in conjunction with duration and energy. We contrast the utility of these features with aggregate statistics of F0, duration and energy used in prior work. Our features are simple to compute yet robust to the inherent difficulties associated with identifying salient points (such as F0 peaks) in the F0 contour. Feature analysis demonstrates that these novel features are significantly more predictive than the standard aggregation-based prosodic features. Experimental results on a corpus of spontaneous speech indicate that prominence detection accuracy using only the new prosodic features is better than using both lexical and syntactic features.

Taniya Mishra | Vivek Kumar Rangarajan Sridhar | Alistair Conkie

[1] Shrikanth S. Narayanan,et al. Exploiting Acoustic and Syntactic Features for Automatic Prosody Labeling in a Maximum Entropy Framework , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[2] Shrikanth Narayanan,et al. Detecting prominence in conversational speech: pitch accent, givenness and focus , 2008, Speech Prosody 2008.

[3] Graham J. Williams,et al. Rattle: A Data Mining GUI for R , 2009, R J..

[4] Xuejing Sun,et al. Pitch accent prediction using ensemble machine learning , 2002, INTERSPEECH.

[5] Klaus Nordhausen,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[6] Paul Taylor,et al. The tilt intonation model , 1998, ICSLP.

[7] Stefanie Shattuck-Hufnagel,et al. A prosodically labeled database of spontaneous speech , 2001 .

[8] Paul Christopher Bagshaw,et al. Automatic prosodic analysis for computer aided pronunciation teaching , 1994 .

[9] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[10] Julia Hirschberg,et al. Detecting Pitch Accents at the Word, Syllable and Vowel Level , 2009, NAACL.

[11] N. M. Veilleuz,et al. Prosody/Parse Scoring and Its Application in ATIS , 1993, HLT.