Mandarin spontaneous narrative planning - prosodic evidence from national taiwan university lecture corpus

This paper discusses discourse planning of pre-organized spontaneous narratives (SpnNS) in comparison with read speech (RS). F0 and tempo modulations are compared by speech paragraph size and discourse boundaries. The speaking rate of SpnNS from university classroom lecture is 2 to 3 times to that of RS by professionals; paragraph phrasing of SpnNS is 6 times that of RS. Patterns of paragraph association are distinct for SpnNS and RS. Sub-paragraph and paragraph units in RS are marked by distinct relative F0 resets and boundary pause duration, but by patterns of intensity contrasts in SpnNS instead. Consistent to both data sets is the finding that combined relative supra-segmental cues reflecting global prosodic properties are more discriminative to distinguish discourse boundaries than any fragments of singular cue, supporting higher-level discourse planning in the acoustic signals. We believe these findings can be directly applied to speech technology development.

[1]  Chiu-yu Tseng,et al.  Fluent speech prosody: Framework and modeling , 2005, Speech Commun..

[2]  Yu Hu,et al.  Towards the automatic extraction of fujisaki model parameters for Mandarin , 2003, INTERSPEECH.

[3]  Sadaoki Furui Recent Progress in Corpus-Based Spontaneous Speech Recognition , 2005, IEICE Trans. Inf. Syst..

[4]  Haizhou Li,et al.  Advances in Chinese Spoken Language Processing , 2006 .

[5]  Chiu-yu Tseng Speech Rate and Prosody Units: Evidence of Interaction from Mandarin Chinese , 2003 .

[6]  Sadaoki Furui,et al.  SPONTANEOUS SPEECH RECOGNITION AND SUMMARIZATION , 2005 .

[7]  Chiu-yu Tseng,et al.  Pause or No Pause?—Prosodic Phrase Boundaries Revisited , 2008 .

[8]  Keikichi Hirose,et al.  Analysis of voice fundamental frequency contours for declarative sentences of Japanese , 1984 .

[9]  S. Furui,et al.  ACOUSTIC AND LINGUISTIC CHARACTERIZATION OF SPONTANEOUS SPEECH , 2006 .

[10]  Hansjörg Mixdorff,et al.  A novel approach to the fully automatic extraction of Fujisaki model parameters , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[11]  Chiu-yu Tseng,et al.  Discourse prosody context - global F0 and tempo modulations , 2008, INTERSPEECH.

[12]  Chiu-yu Tseng,et al.  Sinica COSPRO and Toolkit — Corpora and Platform of Mandarin Chinese Fluent Speech , 2005 .

[13]  Keikichi Hirose,et al.  Filled pauses as cues to the complexity of upcoming phrases for native and non-native listeners , 2008, Speech Commun..

[14]  Elizabeth Shriberg,et al.  Spontaneous speech: how people really talk and why engineers should care , 2005, INTERSPEECH.

[15]  TSENG Chiu-yu,et al.  BOUNDARY AND LENGTHENING — ON RELATIVE PHONETIC INFORMATION , 2010 .