Topic transitions and durational prosody in reading aloud: production and modeling

Abstract The linguistic structure of an utterance is known to affect the durational prosody of sounds, words and phrases. There has been increasing interest in how discourse-level organization affects prosody, in part because modeling discourse-level effects could improve the comprehensibility of longer passages of synthesized text. The approach taken here is to look at how topics are sequenced in a text, and how this affects durational prosody when that text is read aloud. Two speakers of American English were recorded reading a set of text materials on 10 separate occasions. Measurements of these recordings indicated that the type of transition in topic between two successive sentences had a significant effect on the amount of sentence-final lengthening, the duration of the pause between sentences, and the speech rate at the end of a sentence and the beginning of the following sentence. These measurements were then used to create a mathematical model of one speaker, and to generate several versions of one of this speaker's original recordings, with each version incorporating different manipulations of the durational patterns and their variability. These versions were played to listeners, who preferred those where the manipulations included durational patterns reflecting the organization of topics in the text.

[1]  Leo G. M. Noordman,et al.  Prosodic markers of text structure , 1999 .

[2]  Leo G. M. Noordman,et al.  Prosodic correlates of text structure , 2000 .

[3]  M. E. van Donzel,et al.  Prosodic Aspects of Information Structure in Discourse , 1999 .

[4]  James F. Allen,et al.  A Study on Prosody and Discourse Structure in Cooperative Dialogues , 1993 .

[5]  Jody Kreiman,et al.  Perception of Sentence and Paragraph Bound-aries in Natural Conversation , 1982 .

[6]  Julia Hirschberg,et al.  Instructions for annotating discourse , 1995 .

[7]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[8]  Johan Wouters,et al.  Effects of prosodic factors on spectral dynamics. I. Analysis. , 2002, The Journal of the Acoustical Society of America.

[9]  Rebecca Herman,et al.  Phonetic markers of global discourse structures in English , 2000, J. Phonetics.

[10]  Laurence White,et al.  Structural influences on accentual lengthening in English , 1999 .

[11]  J M Terken,et al.  Beyond Sentence Prosody: Paragraph Intonation in Dutch , 1993, Phonetica.

[12]  Julia Hirschberg,et al.  A Prosodic Analysis of Discourse Segments in Direction-Giving Monologues , 1996, ACL.

[13]  Nina Gro nnum Thorsen,et al.  Intonation and text in Standard Danish , 1985 .

[14]  N. Umeda Vowel duration in American English. , 1975, The Journal of the Acoustical Society of America.

[15]  Gérard Bailly,et al.  Characterisation of rhythmic patterns for text-to-speech synthesis , 1994, Speech Communication.

[16]  Nick Campbell,et al.  Timing in Speech: A Multi-Level Process , 2000 .

[17]  M. Swerts Prosodic features at discourse boundaries of different strength. , 1997, The Journal of the Acoustical Society of America.

[18]  Gillian Brown,et al.  Questions of intonation , 1980 .

[19]  M. E. van Donzel,et al.  Discourse structure and its influence on local speech rate , 1996 .

[20]  Julia Hirschberg,et al.  The intonational Structuring of Discourse , 1986, ACL.

[21]  K G Munhall,et al.  An examination of intra-articulator relative timing. , 1985, The Journal of the Acoustical Society of America.

[22]  Colin W. Wightman,et al.  Segmental durations in the vicinity of prosodic phrase boundaries. , 1992, The Journal of the Acoustical Society of America.

[23]  Gérard Bailly,et al.  Talking Machines: Theories, Models, and Designs , 1992 .

[24]  A. Cohen,et al.  Structure and Process in Speech Perception , 1975 .

[25]  George Yule,et al.  Speakers' topics and major paratones , 1980 .

[26]  Stefanie Shattuck-Hufnagel,et al.  A prosody tutorial for investigators of auditory sentence processing , 1996, Journal of psycholinguistic research.

[27]  Julia Hirschberg,et al.  Some intonational characteristics of discourse structure , 1992, ICSLP.

[28]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[29]  Björn Lindblom,et al.  Frontiers of speech communication research , 1979 .

[30]  Mari Ostendorf,et al.  Prosodic and lexical indications of discourse structure in human-machine interactions , 1997, Speech Commun..

[31]  James Paul Gee,et al.  Performance structures: A psycholinguistic and linguistic appraisal , 1983, Cognitive Psychology.

[32]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[33]  I. Lehiste The Phonetic Structure of Paragraphs , 1975 .

[34]  Matthew Flatt,et al.  PsyScope: An interactive graphic system for designing and controlling experiments in the psychology laboratory using Macintosh computers , 1993 .

[35]  Yee-Jean Janice Fon A cross-linguistic study on syntactic and discourse boundary cues in spontaneous speech , 2002 .

[36]  Eduard Hovy,et al.  Computational and conversational discourse : burning issues, an interdisciplinary account , 1996 .

[37]  Merle Horne,et al.  Prosody: Theory and Experiment , 2000 .

[38]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[39]  R. Littell SAS System for Mixed Models , 1996 .

[40]  Gökhan Tür,et al.  Prosody-based automatic segmentation of speech into sentences and topics , 2000, Speech Commun..

[41]  Ilana Mushin,et al.  Representational issues in annotation: Using the Australian map task corpus to relate prosody and discourse structure , 2001, Speech Commun..

[42]  Jan P. H. van Santen,et al.  Assignment of segmental duration in text-to-speech synthesis , 1994, Comput. Speech Lang..

[43]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[44]  G. Ayers Discourse functions of pitch range in spontaneous and read speech , 1994 .

[45]  Johan Wouters,et al.  Effects of prosodic factors on spectral dynamics. II. Synthesis. , 2002, The Journal of the Acoustical Society of America.

[46]  Gregor Möhler,et al.  Deriving document structure from prosodic cues , 2001, INTERSPEECH.

[47]  M. Swerts,et al.  Prosody as a Marker of Information Flow in Spoken Discourse , 1994 .

[48]  Hajime Tsukada,et al.  Prosodic Features of Utterances in Task-Oriented Dialogues , 1997, Computing Prosody.

[49]  T H Crystal,et al.  Segmental durations in connected speech signals: preliminary results. , 1982, The Journal of the Acoustical Society of America.

[50]  M. Beckman,et al.  The articulatory kinematics of final lengthening. , 1991, The Journal of the Acoustical Society of America.

[51]  Rebecca J. Passonneau,et al.  Empirical Analysis of Three Dimensions of Spoken Discourse: Segmentation, Coherence, and Linguistic Devices , 1996 .