A Maximum Entropy Based Hierarchical Model for Automatic Prosodic Boundary Labeling in Mandarin

Modeling prosodic rhythm is of great importance for both speech synthesis and speech understanding, and it requires a large enough corpus with precise prosodic boundary labels. This paper proposes a maximum entropy (ME) based hierarchical model, which utilizes both text and acoustic features, to automatically label Mandarin prosodic boundaries. Results of comparative experiments show that, for the task of prosodic boundary detection, ME model obviously outperforms classification and regression tree (CART), and the bottom-up hierarchical framework is also significantly superior to the flat single-level framework.