Development of an F0 control model based on F0 dynamic characteristics for singing-voice synthesis

Abstract A fundamental frequency (F0) control model, which can cope with F0 dynamic characteristics related to singing-voice perception, is required to construct natural singing-voice synthesis systems. This paper discusses importance of F0 dynamic characteristics in singing-voices and demonstrates how strongly they influence singing-voice perception through psychoacoustic experiments. This paper, then, proposes an F0 control model that can generate F0 contours of singing-voices based on these considerations, and a singing-voice synthesis system. The results show that several types of F0 fluctuation—overshoot, vibrato, preparation, and fine fluctuation—affect the perception and quality of a singing-voice, and that overshoot has the greatest effect. Moreover, the results show that the proposed F0 control model can control F0 fluctuations, generate F0 contours of singing-voices, and can be applied to natural singing-voice synthesis.

[1]  J. Jiang,et al.  Vocal fold physiology. , 2000, Otolaryngologic clinics of North America.

[2]  William H. Press,et al.  Numerical recipes in C , 2002 .

[3]  Roy D. Patterson,et al.  Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity , 1999, EUROSPEECH.

[4]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[5]  Hitoshi Ogawa,et al.  A new control model based on rising and falling fundamental frequency , 1996 .

[6]  Hironori Kitakaze,et al.  Perception of synthesized singing voices with fine fluctuations in their fundamental frequency contours , 2000, INTERSPEECH.

[7]  J. Flanagan,et al.  Synthesis of voiced sounds from a two-mass model of the vocal cords , 1972 .

[8]  Johan Sundberg,et al.  Maximum speed of pitch changes in singers and untrained subjects , 1979 .

[9]  Masato Akagi,et al.  Fundamental frequency fluctuation in continuous vowel utterance and its perception , 1998, ICSLP.

[10]  I. Nakayama,et al.  Comparative studies on vocal expressions in Japanese traditional and Western classical-style singing using common verse , 2004 .

[11]  John F. Michel,et al.  Vibrato and pitch transitions , 1987 .

[12]  Y. Horii Acoustic analysis of vocal vibrato: A theoretical interpretation of data , 1989 .

[13]  E. Thomas Doherty,et al.  Acoustic characteristics of vocal oscillations: Vibrato, exaggerated vibrato, trill, and trillo , 1988 .

[14]  Hideki Kawahara,et al.  Comparative evaluation of F0 estimation algorithms , 2001, INTERSPEECH.

[15]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .