Prosodic modeling with rich syntactic context in HMM-based Mandarin speech synthesis

To further explore the relevance between prosody and syntax information, we propose a novel approach of prosodic modeling with rich syntactic context instead of prosodic structure in HMM-based Mandarin speech synthesis. Considering the characteristics of Mandarin itself, word-based and character-based syntactic parsings are investigated in this study respectively. This method can not only avoid the existing cascade error in conventional way of prosodic parameter prediction but also not rely on the manually annotated corpora of prosodic structure. Experimental results show that even though automatic syntactic parsing has limited precision, prosodic modeling with rich syntactic context could still achieve significant better performance than the way of the manually annotated prosodic corpora, especially in duration evaluation.

[1]  Keikichi Hirose,et al.  Corpus-based generation of prosodic features from text based on generation process model , 2007, INTERSPEECH.

[2]  Cao Jian-fen Prediction of Prosodic Organization Based on Grammatical Information , 2003 .

[3]  Meng Zhang,et al.  Parsing-based Chinese word segmentation integrating morphological and syntactic information , 2011, 2011 7th International Conference on Natural Language Processing and Knowledge Engineering.

[4]  Wu Yi-jian HMM-based Trainable Speech Synthesis for Chinese , 2006 .

[5]  馮 勝利 汉语的韵律, 词法与句法 = Interactions between morphology syntax and prosody in Chinese , 1997 .

[6]  Keikichi Hirose,et al.  Improved Mandarin segmental duration prediction with automatically extracted syntax features , 2010, IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS.

[7]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[8]  Nianwen Xue,et al.  Building a Large-Scale Annotated Chinese Corpus , 2002, COLING.

[9]  Keiichi Tokuda,et al.  Hidden Markov models based on multi-space probability distribution for pitch pattern modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  J. Bernstein,et al.  Syntax and speech , 1984, Proceedings of the IEEE.

[11]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[12]  Wu Yun-fang A Study on Chinese Prosodic Hierarchy Prediction Based on Dependency Grammar Analysis , 2008 .

[13]  Ren-Hua Wang,et al.  Chinese prosody phrase break prediction based on maximum entropy model , 2004, INTERSPEECH.