Analysis and modeling of syllable duration for Thai speech synthesis

This paper describes the analysis results on the control factors of Thai syllable duration, and a statistical control model using linear regression technique. The analyses have been carried out both at a syllable level and at a phrase level. In a syllable level duration control, the effects of five Thai tones and syllable structures are investigated. To analyze syllable structure effects statistically, we applied the quantification theory with two linguistic factors: (1) phone categories by themselves, and (2) the categories grouped by articulatory similarities. In a phrase level, the effects of position in a phrase and syllable counts in a phrase were analyzed. The experimental results showed that tones, syllable structures, and position in a phrase play significant roles on syllable duration control. Syllable counts in a phrase slightly affects the syllable duration. These analysis results have been integrated into a statistical control model. The duration assignment precision of the proposed model is evaluated using 2480-word speech data. Total correlation 0.73 between predicted values and observed values for test set samples shows the fair precision of the proposed control model.