Novel eigenpitch-based prosody model for text-to-speech synthesis

Prosody is an inherent supra-segmental feature in speech that human speakers employ to express, for example, attitude, emotion, intent and attention. In textto-speech (TTS) systems, high naturalness can only be achieved if the prosody of the output is appropriate. The importance of prosody is even more crucial for tonal languages, such as Mandarin Chinese, in which the tone of each syllable is described by its pitch contour. In this paper, we propose a novel prosody modeling approach that uses the concept of syllablebased eigenpitch. The approach has been implemented in our Mandarin TTS system resulting in less than 0.1% error variance. The results obtained in practical experiments have confirmed the good performance of the proposed technique.

[1]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[2]  Jian Yu,et al.  A New Pitch Generation Model Based on Internal Dependence of Pitch Contour for Manadrin TTS System , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3]  Jerome R. Bellegarda,et al.  Statistical prosodic modeling: from corpus design to parameter estimation , 2001, IEEE Trans. Speech Audio Process..

[4]  Jani Nurminen,et al.  On analysis of eigenpitch in Mandarin Chinese , 2004, 2004 International Symposium on Chinese Spoken Language Processing.

[5]  Sin-Horng Chen,et al.  A new pitch modeling approach for Mandarin speech , 2003, INTERSPEECH.

[6]  Li Aijun,et al.  CHINESE PROSODY AND PROSODIC LABELING OF SPONTANEOUS SPEECH , 2002 .