Quantitative and structural modeling of voice fundamental frequency contours of speech in Mandarin

Abstract This paper presents an approach to structural modeling of voice fundamental frequency contours ( F 0 contours) of Mandarin utterances as a sequence of modulated tones. A proposed functional model mathematically implements the tone modulation with both local and global controls. The local control consists of placing a series of normalized F 0 targets along the time axis, which are specified by transition time and amplitudes and are always reached; and the transitions between targets are approximated by connecting truncated second-order transition functions. The global control in terms of sentence modality simply compresses or expands the heights and ranges of the prototypical patterns of syllabic tones generated by the local control. Both local and global controls are integrated in a unified framework, and this paper explains the underlying scientific and linguistic principles. Analysis of 1044 utterances of various sentences read by eight native speakers revealed that the model could closely approximate the observed F 0 contours with a small number of parameters. These parameters are localized and suited to a data-driven fitting process. As will be demonstrated, the model also is promising for measuring intonation variations from observed F 0 contours.

[1]  M. Hirano Morphological structure of the vocal cord as a vibrator and its variations. , 1974, Folia phoniatrica.

[2]  Richard Sproat,et al.  Multilingual Text-to-Speech Synthesis: The Bell Labs Approach , 1998, CL.

[3]  P Taylor,et al.  Analysis and synthesis of intonation using the Tilt model. , 2000, The Journal of the Acoustical Society of America.

[4]  Hisashi Kawai,et al.  Pitch Targets Anchor Chinese Tone and Intonation Patterns , 2004 .

[5]  A. D. Dominicis,et al.  Intonation Systems: A Survey of Twenty Languages , 1999 .

[6]  J. Jiang,et al.  Vocal fold physiology. , 2000, Otolaryngologic clinics of North America.

[7]  Pierre A. Hallé,et al.  Tone production in modern standard chinese : an electromyographic investigation , 1986 .

[8]  X. Shen The Prosody of Mandarin Chinese , 1990 .

[9]  Merle Horne,et al.  Prosody: Theory and Experiment , 2000 .

[10]  J. M. Pickett,et al.  Producing Speech: Contemporary Issues, for Katherine Safford Harris , 1996 .

[11]  Peter F. MacNeilage,et al.  The Production of Speech , 2011, Springer New York.

[12]  Sumio Ohno,et al.  Physiological mechanisms for fundamental frequency control in standard Chinese , 2000, INTERSPEECH.

[13]  John Hart,et al.  A Perceptual Study of Intonation , 1990 .

[14]  Keikichi Hirose,et al.  Estimation of intonation variation with constrained tone transformations , 2005, INTERSPEECH.

[15]  Daniel Hirst,et al.  Levels of Representation and Levels of Analysis for the Description of Intonation Systems , 2000 .

[16]  Y Xu,et al.  Consistency of Tone-Syllable Alignment across Different Syllable Structures and Speaking Rates , 1998, Phonetica.

[17]  Chilin Shih,et al.  Prosody modeling with soft templates , 2003, Speech Commun..

[18]  趙 元任,et al.  A grammar of spoken Chinese = 中國話的文法 , 1968 .

[19]  S H Chen,et al.  A statistical model based fundamental frequency synthesizer for Mandarin speech. , 1992, The Journal of the Acoustical Society of America.

[20]  Keikichi Hirose,et al.  Analysis of voice fundamental frequency contours for declarative sentences of Japanese , 1984 .

[21]  Chilin Shih,et al.  Quantitative measurement of prosodic strength in Mandarin , 2003, Speech Commun..

[22]  Dennis Butler Fry The Physics of Speech , 1979 .

[23]  Chiu-yu Tseng,et al.  Improved tone concatenation rules in a formant-based Chinese text-to-speech system , 1993, IEEE Trans. Speech Audio Process..

[24]  Hisashi Kawai,et al.  Tone feature extraction through parametric modeling and analysis-by-synthesis-based pattern matching , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[25]  Björn Lindblom,et al.  Frontiers of speech communication research , 1979 .

[26]  Hiroya Fujisaki,et al.  Dynamic Characteristics of Voice Fundamental Frequency in Speech and Singing , 1983 .

[27]  E. Gårding Speech Act and Tonal Pattern in Standard Chinese: Constancy and Variation , 1987, Phonetica.

[28]  Ingo R. Titze Regulation of fundamental frequency with a physiologically-based model of the larynx , 1997 .

[29]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[30]  Emily Q. Wang,et al.  Pitch targets and their realization: Evidence from Mandarin Chinese , 2001, Speech Commun..

[31]  N. Thorsen,et al.  A study of perception of sentence intonation--evidence from Danish. , 1980, The Journal of the Acoustical Society of America.

[32]  Dwight L. Bolinger,et al.  THE MELODY OF LANGUAGE , 1980 .