Mandarin pitch accent prediction using hierarchical model based ensemble machine learning

In this study, we combine the Mandarin characteristics with Mandarin acoustic attribute and text information and use hierarchical model based ensemble machine learning to predict Mandarin pitch accent. Our model could make the best of advantages of prosody hierarchical structure and ensemble machine learning. When comparing our model with classification and regression tree (CART), support vector machine (SVM), adaboost with CART at different experimental conditions, the hierarchical model obtains the best results, it can achieve 84.75% accuracy rate to Mandarin read speech. At the same time, we compare our proposed method with previous proposed method at the same training set and test set. There are 2.25% and 0.82% absolute accuracy rate improvements.

[1]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[2]  Shrikanth S. Narayanan,et al.  An automatic prosody recognizer using a coupled multi-stream acoustic model and a syntactic-prosodic language model , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  Fabio Tamburini,et al.  Automatic prominence identification and prosodic typology , 2005, INTERSPEECH.

[4]  Shrikanth S. Narayanan,et al.  Exploiting Acoustic and Syntactic Features for Automatic Prosody Labeling in a Maximum Entropy Framework , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Julia Hirschberg,et al.  Detecting pitch accent using pitch-corrected energy-based predictors , 2007, INTERSPEECH.

[6]  Huang Tai-yi Study on Stress Perception in Chinese Speech , 2005 .

[7]  Mattias Heldner,et al.  A focus detector using overall intensity and high frequency emphasis , 1999 .

[8]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[9]  Li Aijun,et al.  CHINESE PROSODY AND PROSODIC LABELING OF SPONTANEOUS SPEECH , 2002 .

[10]  Francis Nolan,et al.  IVie - a comparative transcription system for intonational variation in English , 1998, ICSLP.

[11]  Mattias Heldner,et al.  Spectral emphasis as an additional source of information in accent detection , 2001 .

[12]  Xuejing Sun,et al.  Pitch accent prediction using ensemble machine learning , 2002, INTERSPEECH.

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[14]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[15]  Paul Taylor,et al.  The tilt intonation model , 1998, ICSLP.

[16]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[17]  Johan Liljencrants,et al.  Acoustic-phonetic Analysis of Prominence in Swedish , 2000 .

[18]  Agaath M. C. Sluijter,et al.  Spectral balance as an acoustic correlate of linguistic stress. , 1996, The Journal of the Acoustical Society of America.

[19]  Shrikanth S. Narayanan,et al.  Fine-grained pitch accent and boundary tone labeling with parametric F0 features , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.