Text-based unstressed syllable prediction in Mandarin

Recently, an increasing attention has been paid to Mandarin word stress which is important for improving the naturalness of speech synthesis. Most of the research on Mandarin speech synthesis focuses on three stress levels: stressed, regular and unstressed. This paper emphasizes the unstressed syllable prediction because the unstressed syllable is also important to the intelligibility of the synthetic speech. Similar as the prosodic structure, it is not easy to detect stress from text analysis due to the complicated context information. A method based on Classification and Regression Tree (CART) model has been proposed to predict the unstressed syllables with the high accuracy of 85%. The method has been finally applied into the TTS system. The experiment shows that the MOS score of synthetic speech has been improved by 0.35; the pitch contour of the new synthesized speech is also closer to natural speech. Index Terms: Text-to-Speech, stress, unstressed syllable, prosody