Neural network-based F0 text-to-speech synthesiser for Mandarin

Ahtract: A neural-network-based approach to synthesising FO information for Mandarin text-tospeech is discussed. The basic idea is to use neural networks to model the relationship between linguistic features, extracted from input text and parameters representing the pitch contour of syllables. Two MLPs are used to separately synthesise the mean and shape of pitch contour, using different linguistic features. A large set of utterances is employed to train these MLPs using the well known back-propagation algorithm. Pronunciation rules for generating FO information are automatically learned and implicitly memorised by the MLPs. In the synthesis, parameters representing the mean and shape of the pitch contour of each syllable are generated using linguistic features extracted from the given input text. Simulation results confirmed that this is a promising approach for FO synthesis. The resulting synthesised pitch contours of syllables match well with their original counterparts. Average root mean square errors of 0.94ms/frame and 1.00ms/frame were achieved.

[1]  Terrence J. Sejnowski,et al.  NETtalk: a parallel network that learns to read aloud , 1988 .

[2]  Sin-Horng Chen,et al.  Vector quantization of pitch information in Mandarin speech , 1990, IEEE Trans. Commun..

[3]  J. Olive,et al.  Rule-synthesis of speech by word concatenation: a first step. , 1974, The Journal of the Acoustical Society of America.

[4]  Y. Sagisaka,et al.  On the prediction of global F/sub 0/ shape for Japanese text-to-speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[6]  Chorkin Chan,et al.  Prosodic Rules for Connected Mandarin Synthesis , 1992, J. Inf. Sci. Eng..

[7]  John N. Gowdy,et al.  Neural network based generation of fundamental frequency contours , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[8]  Chiu-yu Tseng,et al.  The synthesis rules in a Chinese text-to-speech system , 1989, IEEE Trans. Acoust. Speech Signal Process..