Automatic Mandarin prosody boundary detecting based on tone nucleus features and DNN model

Automatic prosodic boundary detection and annotation are important for both speech understanding and natural speech synthesis. Manual annotation of prosody boundary label is very laborious and time consuming. In this paper, from the perspective of interaction of adjacent tones, we proposed a method to automatically detect prosody boundary based on tone nucleus features and Deep Neural Network (DNN) model. This method firstly calculated the boundary-related parameters by applying the tone nucleus features. Then, the parameters were modeled by DNN. For comparison, the baseline system applied the acoustic features of syllable. The experimental results showed that the proposed method using tone nucleus features outperformed the baseline system, with a relative 4% improvement. It demonstrated the efficiency of the proposed method.

[1]  Bo Xu,et al.  Study on prosodic boundary location in Chinaese mandarin , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Jeung-Yoon Choi,et al.  Simultaneous recognition of words and prosody in the Boston University Radio Speech Corpus , 2005, Speech Commun..

[3]  Li-Rong Dai,et al.  Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and n-gram prior distributions , 2015, INTERSPEECH.

[4]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[5]  Dai Lirong Automatic phrase boundary labeling for a Mandarin TTS corpus using the Viterbi decoding algorithm , 2011 .

[6]  Keikichi Hirose,et al.  Tone nucleus modeling for Chinese lexical tone recognition , 2004, Speech Commun..

[7]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[8]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Hiromichi Kawanami,et al.  Modeling carryover and anticipation effects for Chinese tone recognition , 1999, EUROSPEECH.

[10]  Emily Q. Wang,et al.  Pitch targets and their realization: Evidence from Mandarin Chinese , 2001, Speech Commun..

[11]  Wu Hua,et al.  An application of SAMPA-c for standard Chinese , 2000, INTERSPEECH.

[12]  Mari Ostendorf,et al.  Automatic labeling of prosodic patterns , 1994, IEEE Trans. Speech Audio Process..

[13]  Xu Bo Classification of mandarin prosodic break based on hierarchical structure of prosodic break , 2011 .