Speech Parameter Generation Algorithm Considering Modulation Spectrum for Statistical Parametric Speech Synthesis

This paper proposes a novel speech parameter generation algorithm considering modulation spectrum for statistical parametric speech synthesis. The over-smoothing effect that is observed in generated speech parameter trajectories deteriorates the synthetic speech quality. A parameter generation algorithm considering Global Variance (GV) is known as an efficient approach to alleviating the over-smoothing effect but this effect still remains to be addressed. Recently, we have found the Modulation Spectrum (MS) which is regarded as an extension of the GV is capable of more sensitively detecting the over-smoothing effect than the GV. To further alleviate the over-smoothing effect, the proposed algorithm integrates the MS into the parameter generation. The experimental results demonstrate that the proposed parameter generation algorithm considering the MS yields significant improvements in synthetic speech quality compared to the conventional parameter generation algorithm considering the GV.

[1]  Keiichi Tokuda,et al.  Multi-Space Probability Distribution HMM , 2002 .

[2]  Tomoki Toda,et al.  Voice Timbre Control Based on Perceived Age in Singing Voice Conversion , 2014, IEICE Trans. Inf. Syst..

[3]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Y. Sagisaka,et al.  Speech synthesis by rule using an optimal selection of non-uniform synthesis units , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[5]  Yu Tsao,et al.  Incorporating global variance in the training phase of GMM-based voice conversion , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.

[6]  Yoshihiko Nankaku,et al.  Integration of speaker and pitch adaptive training for HMM-based singing voice synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  H. Zen,et al.  Continuous Stochastic Feature Mapping Based on Trajectory HMMs , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[9]  Kou Tanaka,et al.  A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation , 2014, IEICE Trans. Inf. Syst..

[10]  Tomoki Toda,et al.  Modulation spectrum-based post-filter for GMM-based Voice Conversion , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[11]  Keiichi Tokuda,et al.  A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[12]  Kai Yu,et al.  Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Tomoki Toda,et al.  Parameter Generation Methods With Rich Context Models for High-Quality and Flexible Text-To-Speech Synthesis , 2014, IEEE Journal of Selected Topics in Signal Processing.

[14]  S. King,et al.  Speech synthesis technologies for individuals with vocal disabilities: Voice banking and reconstruction , 2012 .

[15]  Yannis Stylianou,et al.  Voice Transformation: A survey , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Tomoki Toda,et al.  A postfilter to modify the modulation spectrum in HMM-based speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[18]  Heiga Zen,et al.  Speech Synthesis Based on Hidden Markov Models , 2013, Proceedings of the IEEE.

[19]  Tomoki Toda,et al.  Modified post-filter to recover modulation spectrum for HMM-based speech synthesis , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[20]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[21]  Ricardo Gutierrez-Osuna,et al.  Can voice conversion be used to reduce non-native accents? , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  S. King,et al.  The Blizzard Challenge 2011 , 2011 .

[23]  Takao Kobayashi,et al.  Average-Voice-Based Speech Synthesis Using HSMM-Based Speaker Adaptation and Adaptive Training , 2007, IEICE Trans. Inf. Syst..

[24]  Heiga Zen,et al.  Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems , 2009, INTERSPEECH.

[25]  Tomoki Toda,et al.  Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Tomoki Toda,et al.  Trajectory training considering global variance for HMM-based speech synthesis , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.