Analytical Study on Fundamental Frequency Contours of Thai Expressive Speech Using Fujisaki's Model

Problem statement: In spontaneous speech communication, prosody is an important factor that must be taken into account, since the prosody effects on not only the naturalness but also the intelligibility of speech. Focusing on synthesis of Thai expressive speech, a number of systems has been developed for years. However, the expressive speech with various speaking styles has not been accomplished. To achieve the generation of expressive speech, we need to model the fundamental frequency (F0) contours accurately to preserve the speech prosody. Approach: Therefore this study proposes an analysis of model parameters for Thai speech prosody with three speaking styles and two genders which is a preliminary work for speech synthesis. Fujisaki's modeling; a powerful tool to model the F0 contour has been adopted, while the speaking styles of happiness, sadness and reading have been considered. Seven derived parameters from the Fujisaki's model are as follows. The first parameter is baseline frequency which is the lowest level of F0 contour. The second and third parameters are the numbers of phrase commands and tone commands which reflect the frequencies of surges of the utterance in global and local levels, respectively. The fourth and fifth parameters are phrase command and tone command durations which reflect the speed of speaking and the length of a syllable, respectively. The sixth and seventh parameters are amplitudes of phrase command and tone command which reflect the energy of the global speech and the energy of local syllable. Results: In the experiments, each speaking style includes 200 samples of one sentence with male and female speech. Therefore our speech database contains 1200 utterances in total. The results show that most of the proposed parameters can distinguish three kinds of speaking styles explicitly. Conclusion: From the finding, it is a strong evidence to further apply the successful parameters in the speech synthesis systems or other speech processing technologies.

[1]  Takao Kobayashi,et al.  Implementation and evaluation of an HMM-based Thai speech synthesis system , 2007, INTERSPEECH.

[2]  F. Hiroya,et al.  A preliminary study on the modeling of fundamental frequency contours of Thai utterances , 2002, 6th International Conference on Signal Processing, 2002..

[3]  Hansjörg Mixdorff,et al.  Automated quantitative analysis of F0 contours of utterances from a German ToBI-labeled speech database , 1997, EUROSPEECH.

[4]  Tomio Takara,et al.  Analysis of pitch contour of Thai tone using Fujisaki's model , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Takao Kobayashi,et al.  Design of tree-based context clustering for an HMM-based Thai speech synthesis system , 2007, SSW.

[6]  Yujia Li,et al.  Analysis and modeling of F0 contours for cantonese text-to-speech , 2004, TALIP.

[7]  Jian Yu,et al.  INTERNAL DEPENDENCE BASED F0 MODEL FOR MANDARIN TTS SYSTEM , 2006 .

[8]  Tomio Takara,et al.  A generative model of fundamental frequency contours for polysyllabic words of Thai tones , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Keikichi Hirose,et al.  Analysis and modeling of tonal features in polysyllabic words and sentences of the standard Chinese , 1990, ICSLP.

[10]  Takashi Saito,et al.  Applying a hybrid intonation model to a seamless speech synthesizer , 2002, INTERSPEECH.

[11]  Eric Castelli,et al.  Linear F0 Contour Model for Vietnamese Tones and Vietnamese Syllable Synthesis with TD-PSOLA , 2006 .

[12]  Takao Kobayashi,et al.  A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features , 2006, IEICE Trans. Inf. Syst..

[13]  Takao Kobayashi,et al.  Tone correctness improvement in speaker-independent average-voice-based Thai speech synthesis , 2009, Speech Commun..

[14]  Takao Kobayashi,et al.  Tone correctness improvement in speaker dependent HMM-based Thai speech synthesis , 2008, Speech Commun..

[15]  Keikichi Hirose,et al.  Quantitative and structural modeling of voice fundamental frequency contours of speech in Mandarin , 2006, Speech Commun..

[16]  Takao Kobayashi,et al.  Speech Synthesis with Various Emotional Expressions and Speaking Styles by Style Interpolation and Morphing , 2005, IEICE Trans. Inf. Syst..

[17]  H. Fujisaki,et al.  The use of a generative model of F/sub 0/ contours for multilingual speech synthesis , 1998, ICSP '98. 1998 Fourth International Conference on Signal Processing (Cat. No.98TH8344).