Speech parameter generation considering LSP ordering property for HMM-based speech synthesis

LSP has many advantages for speech representation, especially correlates well to spectrum formants as long as the LSP parameters are strictly ordered and bounded. This ordering property cannot be guaranteed during HMM-based speech synthesis when LSP is adopted as the spectrum feature, because diagonal covariance is utilized and correlation between LSP dimensions is ignored, with the result that unstable issue will be caused in synthesized speech. In this paper, we modify the parameter generation criterion to preserve ordering property of generated LSPs, by considering not only the likelihoods for HMM and GV maximized in conventional method but also a mis-orderings penalty. Experimental results show that the proposed method can alleviate the mis-orderings significantly and achieve high quality synthesizing performance when the penalty weight is selected appropriately.

[1]  K. Tokuda,et al.  Speech parameter generation from HMM using dynamic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Biing-Hwang Juang,et al.  Line spectrum pair (LSP) and speech data compression , 1984, ICASSP.

[3]  Keiichi Tokuda,et al.  A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[4]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[5]  Ian Vince McLoughlin,et al.  Line spectral pairs , 2008, Signal Process..

[6]  Li-Rong Dai,et al.  Preserve ordering property of generated LSPS for minimum generation error training in HMM-based speech synthesis , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Heiga Zen,et al.  The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006 , 2006, IEICE Trans. Inf. Syst..

[8]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  F. Itakura Line spectrum representation of linear predictor coefficients of speech signals , 1975 .

[10]  Fumitada Itakura,et al.  Speech analysis and synthesis methods developed at ECL in NTT - From LPC to LSP - , 1986, Speech Commun..