Research on text analysis for Tibetan statistical parametric speech synthesis

Text analysis is the front-end of a TTS system, which has a great influence on the naturalness of the back-end speech synthesis. Statistical parametric speech synthesis is being commonly applied into speech synthesis now, and gradually becoming an important method of the current speech synthesis, however, the research of front-end text analysis is often overlooked in the process of current Tibetan speech synthesis, so the research of Tibetan text analysis is still staying in an initial stage. The research of this paper is faced on the Tibetan statistical parametric speech synthesis system, the mainly work of us is to have a text analysis for the input Tibetan text, and the aim is to acquire the mono-phone labeling information and context-dependent labeling information, which is needed by the back-end speech synthesis. Finally, we have a speech synthesis to the Tibetan text with the statistical parametric information that we have obtained through the process of text analysis, then the quality of synthetic speech is evaluated by MOS evaluation, we choose randomly 50 sentences to apply to the process of MOS evaluation, through the experiment, the score of MOS evaluation can reach 4.0 points, so it indicated that the naturalness and intelligibility of synthetic speech is good, and the method of this paper is effective.

[1]  Pei Dong Realizing Mandarin-Tibetan bilingual speech synthesis by speaker adaptive training , 2013 .

[2]  杨鸿武,et al.  Speech unit segmentation for Tibetan speech synthesis , 2015 .

[3]  Hongwu Yang,et al.  Speech enhancement using orthogonal matching pursuit algorithm , 2014, 2014 International Conference on Orange Technologies.

[4]  K. Tokuda,et al.  A Training Method of Average Voice Model for HMM-Based Speech Synthesis , 2003, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[5]  H. Zen,et al.  An HMM-based speech synthesis system applied to English , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[6]  Ren-Hua Wang,et al.  A new Chinese text-to-speech system with high naturalness , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Keiichi Tokuda,et al.  Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis , 2014, Multimedia Tools and Applications.

[8]  Jean-Pierre Martens,et al.  Automatic labeling of speech synthesis corpora , 1994, ICSLP.

[9]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Yuan Lich Statistical syntactic parsing methods , 2014 .

[11]  Yoshihiko Nankaku,et al.  State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis , 2009, INTERSPEECH.

[12]  Heiga Zen,et al.  Deterministic Annealing EM Algorithm in Acoustic Modeling for Speaker and Speech Recognition , 2005, IEICE Trans. Inf. Syst..