Analysis of prosody increment induced by pitch accents for automatic emphasis correction

We are interested in developing an automatic emphasis correction system, which converts any unemphasized word in an utterance into emphasized. Analyzing how prosody changes from unaccented to accented is crucial for the task. While previous works on prosody reconstruction only model the prosody contour itself instead of the increment, we propose a framework to study the prosody increment induced by pitch accents from real speech in a statistically rigorous manner. This framework also infers the degree of emphasis of each word to account for the additional prosody variations due to metalinguistic factors. According to the analysis results, this framework provides a lot of useful insights into the prosody increment, which are consistent with many existing studies on pitch accent and emphasis.

[1]  J. F. Brandt,et al.  Vocal loudness and effort in continuous speech. , 1969, The Journal of the Acoustical Society of America.

[2]  Antoine Raux,et al.  A unit selection approach to F0 modeling and its application to emphasis , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[3]  Yi Xu,et al.  Phonetic realization of focus in English declarative intonation , 2005, J. Phonetics.

[4]  D. Klatt Vowel Lengthening is Syntactically Determined in a Connected Discourse. , 1975 .

[5]  Colin W. Wightman Perception of multiple levels of prominence in spontaneous speech , 1993 .

[6]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[7]  W. Greene,et al.  计量经济分析 = Econometric analysis , 2009 .

[8]  Heejin Kim,et al.  The stress foot as a unit of planned timing: evidence from shortening in the prosodic phrase , 2005, INTERSPEECH.

[9]  D. Ladd,et al.  The perception of intonational emphasis: continuous or categorical? , 1997 .

[10]  Steven Greenberg,et al.  Vowel height is intimately associated with stress accent in spontaneous american English discourse , 2001, INTERSPEECH.

[11]  V. V. van Heuven,et al.  Spectral balance as a cue in the perception of linguistic stress. , 1997, The Journal of the Acoustical Society of America.

[12]  Paavo Alku,et al.  Synthesis and perception of breathy, normal, and Lombard speech in the presence of noise , 2014, Comput. Speech Lang..

[13]  Heinrich Niemann,et al.  Can We Tell Apart Intonation From Prosody (if We Look At Accents And Boundaries) , 1997 .

[14]  Chiu-yu Tseng,et al.  Fluent speech prosody: Framework and modeling , 2005, Speech Commun..

[15]  Paul Taylor,et al.  The rise/fall/connection model of intonation , 1994, Speech Communication.

[16]  R. D. Glave,et al.  Is the effort dependence of speech loudness explicable on the basis of acoustical cues? , 1975, The Journal of the Acoustical Society of America.

[17]  N. Umeda,et al.  Automatic synthesis from ordinary english test , 1973 .

[18]  Ann K. Syrdal,et al.  Perceptually based automatic prosody labeling and prosodically enriched unit selection improve concatenative text-to-speech synthesis , 2000, INTERSPEECH.

[19]  Gautham J. Mysore,et al.  Capture-Time Feedback for Recording Scripted Narration , 2015, UIST.