Hierarchical prosodic boundary prediction for Uyghur TTS

Correct prosodic boundary prediction is crucial for the quality of synthesized speech. This paper presents the prosodic hierarchy of Uyghur-language which belongs to agglutinative language. A two-layer bottom-up hierarchical approach based on conditional random fields (CRF) is used for predicting prosodic word (PW) and prosodic phrase (PP) boundaries. In order to disambiguate the confusion between different prosodic boundaries at punctuation sites, CRF based prosodic boundary determination model is used and integrated with bottom-up hierarchical approach. Word suffix feature is considered useful for prosodic boundary prediction and added into the feature sets. The experimental results show that the proposed method successfully resolves the confusion between different prosodic boundaries. Consequently, further enhance the accuracy of prosodic boundary prediction.

[1]  Takehiko Kagoshima,et al.  Parsing Hierarchical Prosodic Structure for Mandarin Speech Synthesis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[2]  Michiel Bacchiani,et al.  Restoring punctuation and capitalization in transcribed speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Jun Xu,et al.  Prosodic Boundary Prediction Based on Maximum Entropy Model with Error-Driven Modification , 2006, ISCSLP.

[4]  Min Chu,et al.  Locating Boundaries for Prosodic Constituents in Unrestricted Mandarin Texts , 2001, Int. J. Comput. Linguistics Chin. Lang. Process..

[5]  Paul Taylor,et al.  Assigning phrase breaks from part-of-speech sequences , 1997, Comput. Speech Lang..

[6]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.