A Chinese text-to-speech synthesis system based on an initial-final model

An experimental text-to-speech software system has been developed. The principle of the system is based on an initial-final model. Under this model, only 58 phonemes and semisyllables are needed for synthesizing unrestricted Chinese words. The standard Pin-Yin system is used for input through a conventional keyboard. Since tone information is an essential acoustic feature of each spoken Chinese word, four basic tonal variation patterns have been developed and are incorporated into the synthesis process. Some principal sandhi rules are also considered. For improving the intelligibility and quality of the output sound, mixed excitation is used for unaspirated fricatives. Also "zeros" of the spectrum are added into the vocal tract model when appropriate for nasals and some consonants.