An RNN-based algorithm to detect prosodic phrase for Chinese TTS

The goal of the work presented here is to automatically predict the prosodic phrase boundaries from the text for Chinese TTS (text-to-speech) by using the trigram of the POS (part-of-speech) with information of the breaks between the prior two word-pairs by using a RNN (recurrent neural network). Prosodic phrase boundaries are very important to a Chinese TTS system because they will influence the prosodic model for speech synthesis. In this paper, the algorithm tries to use RNN to find some mapping relationship between the POS sequence and prosodic phrase boundaries, and hopes to improve the naturalness of synthesized speech.

[1]  Eileen Fitzpatrick,et al.  Prosodic phrasing for speech synthesis of written telecommunications by the deaf , 1991, IEEE Global Telecommunications Conference GLOBECOM '91: Countdown to the New Millennium. Conference Record.

[2]  Mari Ostendorf,et al.  Automatic labeling of prosodic patterns , 1994, IEEE Trans. Speech Audio Process..

[3]  Bo Shi,et al.  A Chinese text-to-speech system , 1989, EUROSPEECH.

[4]  Hiroshi Shimodaira,et al.  Prosodic phrase segmentation by pitch pattern clustering , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Eric Sanders,et al.  Using Statistical Models to Predict Phrase Boundaries for Speech Synthesis , 1995 .

[6]  Paul Taylor,et al.  Assigning phrase breaks from part-of-speech sequences , 1997, Comput. Speech Lang..

[7]  Simon Arnfield,et al.  Word class driven synthesis of prosodic annotations , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  Chilin Shih,et al.  Issues in Text-to-Speech Conversion for Mandarin , 1996, Int. J. Comput. Linguistics Chin. Lang. Process..

[9]  Mari Ostendorf,et al.  Automatic recognition of prosodic phrases , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Sin-Horng Chen,et al.  An RNN-based prosodic information synthesizer for Mandarin text-to-speech , 1998, IEEE Trans. Speech Audio Process..