To Memorize or to Predict: Prominence labeling in Conversational Speech

The immense prosodic variation of natural conversational speech makes it challenging to predict which words are prosodically prominent in this genre. In this paper, we examine a new feature, accent ratio, which captures how likely it is that a word will be realized as prominent or not. We compare this feature with traditional accentprediction features (based on part of speech and N -grams) as well as with several linguistically motivated and manually labeled information structure features, such as whether a word is given, new, or contrastive. Our results show that the linguistic features do not lead to significant improvements, while accent ratio alone can yield prediction performance almost as good as the combination of any other subset of features. Moreover, this feature is useful even across genres; an accent-ratio classifier trained only on conversational speech predicts prominence with high accuracy in broadcast news. Our results suggest that carefully chosen lexicalized features can outperform less fine-grained features.

[1]  Malvina Nissim,et al.  Learning Information Status of Discourse Entities , 2006, EMNLP.

[2]  Jean Carletta,et al.  Animacy Encoding in English: Why and How , 2004, ACL 2004.

[3]  Stefanie Shattuck-Hufnagel,et al.  A prosodically labeled database of spontaneous speech , 2001 .

[4]  Julia Hirschberg,et al.  Modeling Local Context for Pitch Accent Prediction , 2000, ACL.

[5]  Mats Rooth A theory of focus interpretation , 1992, Natural Language Semantics.

[6]  Julia Hirschberg,et al.  Pitch Accent in Context: Predicting Intonational Prominence from Text , 1993, Artif. Intell..

[7]  Shimei Pan,et al.  Word Informativeness and Automatic Pitch Accent Modeling , 1999, EMNLP.

[8]  Julia Hirschberg,et al.  Exploring features from natural language generation for prosody modeling , 2002, Comput. Speech Lang..

[9]  Yasemin Altun,et al.  Using Conditional Random Fields to Predict Pitch Accents in Conversational Speech , 2004, ACL.

[10]  Mark Steedman,et al.  A Framework for Annotating Information Structure in Discourse , 2005, FCA@ACL.

[11]  Jeanette K. Gundel,et al.  Cognitive Status and the Form of Referring Expressions in Discourse , 1993 .

[12]  W. Chafe Givenness, contrastiveness, definiteness, subjects, topics, and point of view , 1976 .

[13]  E. Prince The ZPG Letter: Subjects, Definiteness, and Information-status , 1992 .

[14]  Mark Steedman,et al.  An Annotation Scheme for Information Status in Dialogue , 2004, LREC.

[15]  Gillian R Brown,et al.  Prosodic Structure and the Given/New Distinction , 1983 .

[16]  Ani Nenkova,et al.  THE (NON)UTILITY OF LINGUISTIC FEATURES FOR PREDICTING PROMINENCE IN SPONTANEOUS SPEECH , 2006, 2006 IEEE Spoken Language Technology Workshop.

[17]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Maria Vilkuna,et al.  On Rheme and Kontrast , 1998 .

[19]  Jiahong Yuan,et al.  Pitch accent prediction: effects of genre and speaker , 2005, INTERSPEECH.

[20]  Geoffrey Sampson,et al.  Word frequency distributions , 2002, Computational Linguistics.

[21]  D. Bolinger Contrastive Accent and Contrastive Stress , 1961 .