论文信息 - A pitch-contour generation method combining ANN, global variance, and real-contour selection

A pitch-contour generation method combining ANN, global variance, and real-contour selection

Pitch contours are important for synthesizing highly natural speech signal. In this paper, we study a new pitch-contour generation method. The proposed method combines ANN prediction module with global-variance matching (GVM) and real contour selection (RCS) modules. A syllable pitch contour is first analyzed and then transformed to a DCT-coefficient vector via discrete cosine transform (DCT). Each sequence of DCT vectors analyzed from a training sentence plus contextual parameters is then used to train the ANN weights and GVM parameters. In pitch-contour generation experiments, we measure variance-ratio (VR) values for objective evaluations. The modules, i.e. GVM and RCS, are shown to be helpful to promote VR values. In addition, in subjective evaluation, the pitch-contour generation method, i.e. ANN + GVM, is shown to be more natural than the method only using ANN. Moreover, the ANN + GVM + RCS method is shown to be better than ANN + GVTVL.

Hung-Yan Gu | Kai-Wei Jiang

[1] Chung-Hsien Wu,et al. Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[2] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[3] Kim-Teng Lua,et al. Pitch contour model for Chinese text-to-speech using CART and statistical model , 2002, INTERSPEECH.

[4] Hung-Yan Gu,et al. An HMM Based Pitch-Contour Generation Method for Mandarin Speech Synthesis , 2011, J. Inf. Sci. Eng..

[5] Thierry Dutoit,et al. Towards a Voice Conversion System Based on Frame Selection , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[6] Olivier Rosec,et al. Voice Conversion Using Dynamic Frequency Warping With Amplitude Scaling, for Parallel or Nonparallel Corpora , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7] Hung-Yan Gu,et al. Speech synthesis using articulatory-knowledge based HMM structure , 2014, 2014 International Conference on Machine Learning and Cybernetics.

[8] Jyh-Yeong Chang,et al. A novel prosodic-information synthesizer based on recurrent fuzzy neural network for the Chinese TTS system , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[9] Hung-Yan Gu,et al. A System Framework for Integrated Synthesis of Mandarin, Min-Nan, and Hakka Speech , 2007, ROCLING/IJCLCLP.

[10] Keiichi Tokuda,et al. A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[11] Li-Rong Dai,et al. Improving F0 prediction using bidirectional associative memories and syllable-level F0 features for HMM-based Mandarin speech synthesis , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[12] Heiga Zen,et al. The HMM-based speech synthesis system (HTS) version 2.0 , 2007, SSW.

[13] Sin-Horng Chen,et al. An RNN-based prosodic information synthesizer for Mandarin text-to-speech , 1998, IEEE Trans. Speech Audio Process..