Histogram-based spectral equalization for HMM-based speech synthesis using mel-LSP

This paper describes a statistical spectral parameter emphasis technique for HMM-based speech synthesis using mel-scaled line spectral pair (mel-LSP). Spectral parameter emphasis is effective for compensating over-smoothed spectra in HMM-based speech synthesis. However, there is no conventional technique that satisfies such requirements as automatic tuning for different speakers and realtime synthesis for mel-LSP. In the proposed method, the cumulative distribution function (CDF) is calculated from the histogram of spectral parameters that are extracted from training speech data. In the same manner, CDF of spectral parameters that are generated from HMMs is constructed. Then an emphasis rule is trained so that the CDF of generated parameters equals to that of training data. After generating a spectral parameter sequence from HMMs, the spectral parameter sequence is emphasized by using the rule. Experimental results show that our proposed method improves speech quality.

[1]  Yu Hu,et al.  Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis , 2010, INTERSPEECH.

[2]  Keiichi Tokuda,et al.  Voice characteristics conversion for HMM-based speech synthesis system , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Ren-Hua Wang,et al.  USTC System for Blizzard Challenge 2006 an Improved HMM-based Speech Synthesis Method , 2006, Blizzard Challenge.

[4]  Keiichi Tokuda,et al.  Mixed excitation for HMM-based speech synthesis , 2001, INTERSPEECH.

[5]  Haizhou Li,et al.  Text-independent F0 transformation with non-parallel data for voice conversion , 2010, INTERSPEECH.

[6]  Heiga Zen,et al.  The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006 , 2006, IEICE Trans. Inf. Syst..

[7]  Keiichi Tokuda,et al.  A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[8]  Koichi Shinoda,et al.  MDL-based context-dependent subword modeling for speech recognition , 2000 .

[9]  K. Tokuda,et al.  Speech parameter generation from HMM using dynamic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Heiga Zen,et al.  Hidden Semi-Markov Model Based Speech Synthesis System , 2006 .

[11]  Yoshihiko Nankaku,et al.  Global variance modeling on frequency domain delta LSP for HMM-based speech synthesis , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).