A pitch-contour generation method combining ANN, global variance, and real-contour selection

Pitch contours are important for synthesizing highly natural speech signal. In this paper, we study a new pitch-contour generation method. The proposed method combines ANN prediction module with global-variance matching (GVM) and real contour selection (RCS) modules. A syllable pitch contour is first analyzed and then transformed to a DCT-coefficient vector via discrete cosine transform (DCT). Each sequence of DCT vectors analyzed from a training sentence plus contextual parameters is then used to train the ANN weights and GVM parameters. In pitch-contour generation experiments, we measure variance-ratio (VR) values for objective evaluations. The modules, i.e. GVM and RCS, are shown to be helpful to promote VR values. In addition, in subjective evaluation, the pitch-contour generation method, i.e. ANN + GVM, is shown to be more natural than the method only using ANN. Moreover, the ANN + GVM + RCS method is shown to be better than ANN + GVTVL.

[1]  Chung-Hsien Wu,et al.  Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[3]  Kim-Teng Lua,et al.  Pitch contour model for Chinese text-to-speech using CART and statistical model , 2002, INTERSPEECH.

[4]  Hung-Yan Gu,et al.  An HMM Based Pitch-Contour Generation Method for Mandarin Speech Synthesis , 2011, J. Inf. Sci. Eng..

[5]  Thierry Dutoit,et al.  Towards a Voice Conversion System Based on Frame Selection , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[6]  Olivier Rosec,et al.  Voice Conversion Using Dynamic Frequency Warping With Amplitude Scaling, for Parallel or Nonparallel Corpora , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Hung-Yan Gu,et al.  Speech synthesis using articulatory-knowledge based HMM structure , 2014, 2014 International Conference on Machine Learning and Cybernetics.

[8]  Jyh-Yeong Chang,et al.  A novel prosodic-information synthesizer based on recurrent fuzzy neural network for the Chinese TTS system , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[9]  Hung-Yan Gu,et al.  A System Framework for Integrated Synthesis of Mandarin, Min-Nan, and Hakka Speech , 2007, ROCLING/IJCLCLP.

[10]  Keiichi Tokuda,et al.  A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[11]  Li-Rong Dai,et al.  Improving F0 prediction using bidirectional associative memories and syllable-level F0 features for HMM-based Mandarin speech synthesis , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[12]  Heiga Zen,et al.  The HMM-based speech synthesis system (HTS) version 2.0 , 2007, SSW.

[13]  Sin-Horng Chen,et al.  An RNN-based prosodic information synthesizer for Mandarin text-to-speech , 1998, IEEE Trans. Speech Audio Process..