Prosody modification and Fujisaki's model: Preserving natural soundness

Control of prosodic characteristics is one of the most important problems in the area of speech synthesis. Fujisaki's model is probably the best model for pitch variations and its inversion is suitable for being integrated within speech synthesizres. This paper proposes a speech synthesis method based on Fujisaki's model (combined direct and inverse modeling) in order to preserve natural soundness of synthesized speech. The idea is to modify a pitch contour on the basis of Fujisaki's features and a reference contour. Experimental results have shown that using constraints related to Fujisaki's model guarantees good natural-sounding speech synthesis.

[1]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[2]  Keikichi Hirose,et al.  A method for automatic extraction of model parameters from fundamental frequency contours of speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Alan V. Oppenheim,et al.  Discrete-Time Signal Pro-cessing , 1989 .

[4]  Keikichi Hirose,et al.  Detection of phrase boundaries in Japanese by low-pass filtering of fundamental frequency contours , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Hiroshi Murata,et al.  Analysis and modeling of word accent and sentence intonation in Swedish , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  H. Fujisaki,et al.  The use of a generative model of F/sub 0/ contours for multilingual speech synthesis , 1998, ICSP '98. 1998 Fourth International Conference on Signal Processing (Cat. No.98TH8344).

[7]  Hiroya Fujisaki,et al.  Dynamic Characteristics of Voice Fundamental Frequency in Speech and Singing , 1983 .

[8]  Eyal Yair,et al.  Super resolution pitch determination of speech signals , 1991, IEEE Trans. Signal Process..

[9]  Juan Manuel Montero-Martínez,et al.  New rule-based and data-driven strategy to incorporate Fujisaki's F/sub 0/ model to a text-to-speech system in Castillian Spanish , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[11]  Hansjörg Mixdorff,et al.  A novel approach to the fully automatic extraction of Fujisaki model parameters , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[12]  Francesco Palmieri,et al.  Inversion of F/sub 0/ model for natural-sounding speech synthesis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13]  Federico Albano Leoni,et al.  Tre progetti per l'italiano parlato , 2003 .